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Abstract. We analyse the computational complexity of finding Nash equilibria in turn- 
based stochastic multiplayer games with cj-regular objectives. We show that restricting 
the search space to equilibria whose payoffs fall into a certain interval may lead to un- 
decidability. In particular, we prove that the following problem is undecidable: Given a 
game Q, does there exist a Nash equilibrium of Q where player wins with probability 1? 
Moreover, this problem remains undecidable when restricted to pure strategies or (pure) 
strategies with finite memory. One way to obtain a decidable variant of the problem is 
to restrict the strategies to be positional or stationary. For the complexity of these two 
problems, we obtain a common lower bound of NP and upper bounds of NP and Pspace 
respectively. Finally, we single out a special case of the general problem that, in many 
cases, admits an efficient solution. In particular, we prove that deciding the existence of 
an equilibrium in which each player either wins or loses with probability f can be done in 
polynomial time for games where the objective of each player is given by a parity condition 
with a bounded number of priorities. 



1. Introduction 

We study stochastic games [s^ played by multiple players on a finite, directed graph. 
Intuitively, a play of such a game evolves by moving a token along edges of the graph: Each 
vertex of the graph is either controlled by one of the players, or it is stochastic. Whenever 
the token arrives at a non-stochastic vertex, the player who controls this vertex must move 
the token to a successor vertex; when the token arrives at a stochastic vertex, a fixed 
probability distribution determines the next vertex. A measurable function maps plays to 
payoffs. In the simplest case, which we discuss here, the possible payoffs of a single play are 
and 1 (i.e. each player either wins or loses a given play). However, due to the presence 
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of stochastic vertices, a player's expected payoff (i.e. her probabihty of winning) can be an 
arbitrary probabihty. 

Stochastic games with cj-regular objectives have been used as a formal model for the 
verification and synthesis of reactive systems under the influence of random events [5| . Such 
a system is usually modelled as a game between the system and its environment, where the 
environment's objective is the complement of the system's objective: the environment is 
considered hostile. Therefore, the research in this area has traditionally focused on two- 
player games where each play is won by precisely one of the two players, so-called two-player 
zero-sum games. However, the system may consist of several components with independent 
objectives, a situation which is naturally modelled by a multiplayer game. 

The most common interpretation of rational behaviour in multiplayer games is captured 
by the notion of a Nash equilibrium [54]. In a Nash equilibrium, no player can improve her 
payoff by unilaterally switching to a different strategy. Chatterjee et al. [11] gave an algo- 
rithm for computing a Nash equilibrium in a stochastic multiplayer game with w-regular 
winning conditions. However, it can be shown that their algorithm may compute an equi- 
librium where all players lose almost surely (i.e. receive expected payoff 0), even when there 
exist other equilibria where all players win almost surely (i.e. receive expected payoff 1). 

In applications, one might look for an equilibrium where as many players as possible 
win almost surely or where it is guaranteed that the expected payoff of the equilibrium falls 
into a certain interval. Formulated as a decision problem, we want to know, given a A;-player 
game Q with initial vertex vq and two thresholds a, 6 G [0, l]'^, whether {G,vq) has a Nash 
equilibrium with expected payoff at least x and at most y. This problem, which we call NE 
for short, is a generalisation of the quantitative decision problem for two-player zero-sum 
games, which asks whether in such a game player has a strategy that ensures to win the 
game with a probability that exceeds a given threshold. 

The problem NE comes in several variants, depending on the type of strategies one 
considers: On the one hand, strategies may be randomised (allowing randomisation over 
actions) or pure (not allowing such randomisation). On the other hand, one can restrict 
to strategies that use (unbounded or bounded) finite memory or even to stationary ones 
(strategies that do not use any memory at all). For the quantitative decision problem, this 
distinction is often not meaningful since in a two-player zero-sum simple stochastic game 
with cj-regular objectives both players have optimal pure strategies with finite memory. 
Moreover, in many games even positional (i.e. both pure and stationary) strategies suffice 
for optimality. However, regarding NE this distinction leads to distinct decision problems, 
which have to be analysed separately. 

Our main result is that NE is undecidable if we allow either arbitrary randomised 
strategies or arbitrary pure strategies. In fact, even the following, presumably simpler, 
problem is undecidable: Given a game Q, decide whether there exists a Nash equilibrium 
(in pure strategies) where player wins almost surely. Moreover, the problem remains 
undecidable if one restricts to randomised or pure strategies with finite memory. 

If we restrict to simpler types of strategies like stationary ones, NE becomes decid- 
able. In particular, for positional strategies the problem is typically NP-complete, and for 
arbitrary stationary strategies it is NP-hard but typically contained in Pspace. To get a 
better understanding of the latter problem, we also relate it to the square root sum problem 
(SqrtSum) by providing a polynomial-time reduction from SqrtSum to NE with the restric- 
tion to stationary strategies. It is a long-standing open problem whether SqrtSum falls 
into the polynomial hierarchy; hence, showing that NE for stationary strategies lies inside 



THE COMPLEXITY OF NASH EQUILIBRIA IN STOCHASTIC MULTIPLAYER GAMES 



3 



the polynomial hierarchy would imply a breakthrough in understanding the complexity of 
numerical computations. 

Finally, we prove decidability for an important fragment of NE, which we call the strictly 
qualitative fragment. This fragment arises from NE by restricting the two thresholds to be 
the same binary payoff. Hence, we are only interested in equilibria where each player either 
wins or loses with probability 1. Formally, the task is to decide, given a A;-player game Q with 
initial vertex vq and a binary payoff x S {0, 1}^, whether the game has a Nash equilibrium 
with expected payoff x. Apart from proving decidability, we show that, depending on the 
representation of the objective, this problem is typically complete for one of the complexity 
classes P, NP, P^p['°s] and PSPACE, and that the problem is invariant under restricting the 
search space to equilibria in pure finite-state strategies. 

Outline. In Section Q, we introduce the model that underlies this work and survey earlier 
work on stochastic two-player zero-sum games. In Section [3, we prove that every stochastic 
multiplayer game has a Nash equilibrium, thereby addressing an inaccuracy in an earlier 
proof by Chatterjee et al. [l6|] . In Section 0, we analyse the complexity of the problem NE 
with respect to the six modes of strategies we consider in this work: positional strategies, 
stationary strategies, pure finite-state strategies, randomised finite-state strategies, arbi- 
trary pure strategies, and arbitrary randomised strategies. Finally, in Section [^, we prove 
that the strictly qualitative fragment of NE is decidable and analyse its complexity. 



Related Work. Determining the complexity of Nash equilibria has attracted much interest 
in recent years. In particular, a series of papers culminated in the result that computing a 
Nash equilibrium of a two-player game in strategic form is complete for the complexity class 
PPAD 23, 18]. More in the spirit of our work, Conitzer and Sandholm [20] showed that 
deciding whether there exists a Nash equilibrium in a two-player game in strategic form 
where player receives payoff at least x and related decision problems are all NP-hard. Fo r 
non-stochastic infinite games, a qualitative version of the problem NE was studied in 



In particular, it was shown that the problem is NP-complete for games with parity winning 
conditions but in P for games with Biichi winning conditions. 

For stochastic games, most results concern the computation of values and optimal 
strategies; see Section for a survey of the most important results. In the multiplayer 
case, Chatterjee et al. [16] showed that the problem of deciding whether a (concurrent) 
stochastic game with reachability objectives has a Nash equilibrium in positional strategies 
with payoff at least x is NP-complete. We sharpen their hardness result by demonstrating 
that the problem remains NP-hard when it is restricted to games with only three players 
(as opposed to an unbounded number of players) where payoffs are assigned at terminal 
vertices only (cf. Theorem |4.4| ). 

A more restricted model of stochastic games, where questions like ours have been stud- 
ied, are Markov decision processes (MDPs) with multiple objectives. These games can be 
considered as stochastic games where only one player can influence the outcome of the 
game. For MDPs with multiple w-regular objectives, Etessami et al. SlJ showed that ques- 
tions like the one we ask are decidable. Their result relies on the fact that, for MDPs with 
multiple reachability objectives on terminal states, stationary strategies suffice to achieve a 
payoff that is higher than a given threshold. Unfortunately, this property does not extend 
to our model: we give an example of a stochastic game with the same kind of objectives 
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where every Na sh eq uilibrium with payoff 1 for the fist player requires infinite memory 
(see Proposition 4.131 ). 



2. Stochastic games 

2.1. Basic definitions. Let us start by giving a formal definition of the game model that 
underlies this paper. The games we are interested in are played by multiple players taken 
from a finite set 11 of players; we usually refer to them as player 0, player 1, player 2, and 
so on. 

The arena of the game is basically a directed, coloured graph. Intuitively, the players 
take turns to form an infinite path through the arena, a play. Additionally, there is an 
element of chance involved: at some vertices, it is not a player who decides how to proceed 
but nature who chooses a successor vertex according to a probability distribution. To 
model this scenario, we partition the set V of vertices into sets Vi of vertices controlled by 
player z S 11 and a set of stochastic vertices, and we extend the edge relation to a transition 
relation that takes probabilities into account. Formally, an arena for a game with players 
in n consists of: 

— a countable, non-empty set V of vertices or states, 

— for each player i a set Vi C.V oi vertices controlled by player i, 

— a transition relation A C 1/ x ([0, 1] U {-L}) x V, and 

— a colouring function x- V ^ C into an arbitrary set C of colours. 

We make the assumption that every vertex is controlled by at most one player: ViCiVj = 
if i ^ j; vertices that are not controlled by a player are stochastic. For technical reasons, 
we also assume that for each vertex v the set 

vA := {w G V : there exists p £ (0, 1] U {_L} such that {v,p, w) E A} 

of possible successor vertices is finite and non-empty. Moreover, we require that probabilities 
appear only on transitions originating in stochastic vertices (if v G Uien ^ {v,p, w) £ A 
then p = _L) and that they are unique: for every pair of a stochastic vertex v and an 
arbitrary vertex w there exists precisely one p £ [0, 1] such that {v,p, w) € A; we denote this 
probability by A{'w \ v). For computational purposes, we assume that these probabilities 
are rational numbers. Finally, for each stochastic vertex v the probabilities on outgoing 
transitions must sum up to 1: J2w<^v ^(^ I ^) = 1- Hence, if z; is a stochastic vertex, 
then the mapping V ^ [0,1]: w ^ A{w \ v) is a discrete probability distribution over V; 
we denote the set of all discrete probability distributions over V by ^{V). 

The description of a game is completed by specifying an objective for each player. On an 
abstract level, these are just arbitrary sets of infinite sequences of colours, i.e. subsets of 
. Since we want to assign a probability to them, we assume that objectives are Borel 
sets over the usual topology on infinite sequences, if not stated otherwise. Since objectives 
specify which plays are winning for a player, they are also called winning conditions. 

In general, we will identify an objective Win C over colours with the corresponding 
objective x~"^(Win) := {tt G : xi'^) S Win} C over vertices (which is also Borel 
since as a mapping — >■ , is continuous). The reason that we allow objectives to refer 
to a colouring of the vertices is that the number of colours can be much smaller than the 
number of vertices, and it is possible that an objective can be represented more succinctly 
as an objective over colours rather than as an objective over vertices. 
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If n is a finite set of players, {V, (Vi)ign, A,x) is an arena and (Winj)ign is a collec- 
tion of objectives, we refer to the tuple G = {Il,V, {Vi)i(zu_, A,x, (Winj)jgn) as a stochastic 
multiplayer game (SMG). An SMG is finite if the set V of vertices is finite. 

A play of G is an infinite path through the arena of Q, i.e. a sequence vr = 7r(0)7r(l) . . . 
of vertices such that for each k £N there exists p G (0, 1] U{_L} with {TT{k),p, 7r{k + 1)) € A. 
Finite prefixes of plays are called histories. We say that a play vr of ^ is won by player i 
if the corresponding sequence of colours fulfils player i's objective, i.e. x(^) ^ Wiuj; the 
payoff of a play vr is the vector x G {0, 1}^ defined by Xj = 1 if and only if x(^) £ Wiuj. 

Often, it is convenient to designate an initial vertex vq G V; we denote the pair (Q, vq) an 
initialised SMG. A play or a history of an initialised SMG {Q, vq) is just a play respectively 
a history of Q that starts in vq. In the following, we will refer to both SMGs and initialised 
SMGs as SMGs; it should always be clear from the context whether the game is initialised 
or not. 

SMGs generalise various stochastic models, each of them the subject of intensive re- 
search. First, there are Markov chains, the basic model for stochastic processes, in which 
no control is possible. These are just SMGs where the set 11 of players is empty and 
(consequently) there are only stochastic vertices. If we extend Markov chains by a single 
controller, we arrive at the model of a Markov decision process (MDP), a model introduced 
by Bellman 0| and heavily used in operations research. Formally, an MDP is an SMG where 
there is only one player (and only one objective). Finally, in a (perfect-information) sto- 
chastic two-player zero-sum game (S2G), there are only two players, player and player 1, 
who have opposing objectives: one player wants to fulfil an objective, while the other one 
wants to prevent her from doing so. Hence, one player's objective is the complement of the 
other player's objective. Due to their competitive nature, these games are also known as 
competitive Markov decision processes fs^ . 




The SMG model also incorporates several non-stochastic models. In particular, we 
call an SMG deterministic if it contains no stochastic vertices. In the two-player zero-sum 
setting, the resulting model has found applications in logic and controller synthesis, to name 
a few. 

2.2. Objectives. We have introduced objectives as abstract sets of infinite sequences. In 
order to be amenable for algorithmic solutions, we need to restrict to a class of objectives 
representable by finite objects. The objectives we consider for this purpose are standard 
in logic and verification (see fs^); for all of them, we require that the set C of colours 
the objective refers to is finite. Moreover, whether an infinite sequence a fulfils such an 
objective only depends on the set Occ(a) of colours occurring in a or on the set Inf(a) of 
colours occurring infinitely often in a. In particular, we deal with the following types of 
objectives: 

— A reachability objective is given by a set F C C of good colours, and the objective 
requires that a good colour is seen at least once. The corresponding subset of is 
Reach(F) := {a G C"^ : Occ(a) n F / 0}. 

— A Biichi objective is again given by a set F C C of good colours, but it requires that a 
good colour is seen infinitely often. The corresponding subset of is Biichi(i<') := {a G 

: Inf(a) nF / 0}. 
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— A co-Biichi objective is also given by a set F Q C good colours; this time, the objective 
requires that from some point onwards only good colours are seen. The corresponding 
subset of is coBuchi(F) = {a G : Inf(a) C F}. 

— A parity objective is given by a priority function Q : C — )■ {0, . . . ,d}, where d E N, which 
assigns to each colour a certain priority. The objective requires that the least priority 
that occurs infinitely often is even. The corresponding subset of C"^ is Parity(J7) = {a G 

: min(Inf(r2(a))) is even}. 

— A Streett objective is given by a set O of Streett pairs {F,G), where F,G '^C. The 
objective requires that, for each of the pairs, if a colour on the left-hand side is seen 
infinitely often, then so is a colour on the right-hand side. The corresponding subset 
of C7^ is Streett(O) = {a G : Inf(a) n F = or Inf(a) n G 7^ for aU (F, G) G Q}. 

— A Rabin objective is given by a set of Rabin pairs {F,G), where F,G OC; it requires 
that for some pair a colour on the left-hand side is seen infinitely often while all colours 
on the right-hand side are seen only finitely often. The corresponding subset of is 
Rabin(O) = {a E C7^ : Inf(a) n F / and Inf(a) n G = for some (F, G) gQ}. 

— A Muller objective is given by a family J- of accepting sets F <^ G, and it requires that the 
set of colours seen infinitely often equals one of these accepting sets. The corresponding 



Parity, Streett, Rabin and Muller objectives are of particular relevance because they provide 
a standard form for arbitrary w-regular objectives: any game with arbitrary cj-regular 
objectives can be reduced to one with parity, Streett, Rabin or Muller objectives (over a 
larger arena) by taking the product of its original arena with a suitable deterministic word 
automaton for each player's objective [56]. 

In this work, for reasons that will become clear later, we are particularly attracted 
to objectives that are invariant under adding and removing finite prefixes; we call such 
objectives prefix-independent. More formally, an objective is prefix-independent if for each 
a £ and x G G* the sequence a satisfies the objective if and only if the sequence x-a does. 
From the objectives listed above, only reachability objectives are, in general, not prefix- 
independent. However, many of our results (in particular, many of our lower bounds) apply 
to games with a prefix-independent form of reachability, which we call terminal reachability. 
For these objectives, we assume that each vertex is coloured by itself, i.e. G = V , and x is 
the identity mapping. The terminal reachability objective for a set F C y coincides with the 
reachability objective for F, but we require that each G F is a terminal vertex: f A = {v}. 
For any such set F, we have Occ(7r) H F 7^ if and only if Inf (tt) n F 7^ for every play tt. 
Hence, terminal reachability objectives can be regarded as prefix-independent objectives. 

For S2Gs, the distinction between reachability and terminal reachability is not impor- 
tant: every S2G with a reachability objective can easily be transformed into an equivalent 
S2G with a reachability objective on terminal states. For SMGs, we believe that any such 
transformation requires exponential time: deciding whether in a deterministic game with 
terminal reachability objectives there exists a play that fulfils each of the objectives can 
be done in polynomial time, whereas the same problem is NP-complete for deterministic 



reachability objective can be viewed as a (co-)Buchi objective. Any (co-)Buchi objective 
is equivalent to a parity objective with only two priorities, and any parity objective is 
equivalent to both a Streett and a Rabin objective; in fact, the intersection (union) of two 
parity objectives is equivalent to a Streett (Rabin) objective. Moreover, any Streett or 



subset of G"^ is Muller(F) = {a e G^ : Inf(a) E F}. 
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Figure 1. A hierarchy of prefix-independent objectives. 



Rabin objective is equivalent to a Muher objective, although the translation from a set of 
Streett/Rabin pairs to an equivalent family of accepting sets is, in general, exponential. 
Finally, the complement of a Biichi (Streett) objective is equivalent to a co-Biichi (Rabin) 
objective, and vice versa, whereas the complement of a parity (Muller) objective is also a 
parity (Muller) objective. In fact, any objective that is equivalent to both a Streett and a 
Rabin objective is equivalent to a parity objective [g^. 

To denote the class of SMGs (S2Gs) with a certain type of objectives, we prefix the 
name SMG (S2G) with the name(s) of the objective; for instance, we use the term Streett- 
Rabin SMG to denote SMGs where each player has a Streett or a Rabin objective. For 
S2Gs, we adopt the convention to name the objective of player first; hence, in a Streett- 
Rabin S2G player has a Streett objective, while player 1 has a Rabin objective. Inspired 
by Condon [19|, we will refer to SMGs with terminal reachability objectives and S2Gs 
with a (terminal) reachability objective for player as simple stochastic multiplayer games 
(SSMGs) and simple stochastic two-player zero- sum games (SS2Gs), respectively. 

Drawing an SMG. When drawing an SMG as a graph, we will use the following conventions: 
The initial vertex is marked by a dangling incoming edge. Vertices that are controlled by a 
player are depicted as circles, where the player who controls the vertex is given by the label 
next to it. Stochastic vertices are depicted as diamonds, where the transition probabilities 
are given by the labels on its outgoing edges (the default being equal probabilities on all 
outgoing transitions) . Finally, terminal vertices are generally represented by their associated 
payoff vector. In fact, we allow arbitrary vectors of rational probabilities as payoffs. This 
does not increase the power of the model since such a payoff vector can easily be realised 
by an SSMG consisting of stochastic and terminal vertices only. 



2.3. Strategies and strategy profiles. 

2.3.1. Randomised and pure strategies. The notion of a strategy lies at the heart of game 
theory. Formally, a (randomised) strategy of player i in an SMG ^ is a mapping a : V*Vi ^ 
T){V) assigning to each sequence xv S V*Vi of vertices ending in a vertex controlled by 
player i a discrete probability distribution over V such that a{xv){w) > only if {v, _L, w) G 
A. Instead of a{xv){w), we usually write a{w \ xv). We say that a play vr of ^ is compatible 
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with a strategy a of player i if a{Tr{k + 1) | 7r(0) . . . 7r(/c)) > for all G N with Tr{k) G Vi. 
Similarly, a history vq . . .v^ is compatible with a if a{vj-^-i \ vq . . . Vj) > ioi all < k < n. 

A (randomised) strategy profile of Q is a tuple (f = (o"j)jgn where (jj is a strategy of 
player i m. Q. We say that a play or a history of Q is compatible with a strategy profile a 
if it is compatible with each (jj. Given a strategy profile a = {(yj)jeii and a strategy r of 
player i, we denote by (a_i,r) the strategy profile resulting from a by replacing cjj with r. 

A strategy a of player i is called pure or deterministic if for each .tu G y * there exists 
w G with a{w \ xv) = 1; note that a pure strategy of player i can be identified with a 
function cr: V*Vi V . A. strategy profile a = ((Tj)ign is called pure (or deterministic) if 
each (Tj is pure. 

2.3.2. T/ie probability space induced by a strategy profile. Given a game Q and a strategy 
profile a = {(Ti)i^u of ^, the conditional probability of u; G F given xv G V*V is the number 
ai{w I ) if f G and the probability A(ti; | f ) if f is a stochastic vertex; let us denote this 
probability by a{w \ xv). Given an initial vertex vq G V, the probabilities a{w | xv) give 
rise to a probability measure: the probability of a basic cylinder set vq. . .Vk ■ equals the 
product 11^=0 ^(^i+i I • • • basic cylinder sets that start in a vertex different from vq 
have probability 0. This definition induces a probability measure on the algebra of cylinder 
sets, which — by Caratheodory's extension theorem — can be extended to a probability mea- 
sure on the Borel cr-algebra over V^; we denote the extended measure by PrJ^. Finally, by 
viewing the colouring function F — )■ C as a continuous function — )■ , we obtain 
a probability measure on the Borel cr-algebra over C"; we abuse notation and denote this 
measure also by Pr^^ . 

For a strategy profile a, we are mainly interested in the probabilities pi := Pr^p(Winj) 
of winning. We call pi the (expected) payoff of a for player i (from vq) and the vector 
{Pi)ien the (expected) payoff of a (from vq). Finally, we say that a history xv of {G,vo) is 
consistent with a if Fi:'^^{xv ■ V^) > 0, i.e. if the basic cylinder induced by this history has 
positive probability. 

In order to apply known results about Markov chains, we can also view the stochastic 
process induced by a strategy profile a as a countable Markov chain Q'^ , defined as follows: 
The set of states of Q"^ is the set of all non-empty sequences of vertices in Q. The 
only transitions from a state xv, where x E V* , v € V , arc to states of the form xvw, 
where u) G V, and such a transition occurs with probability p > if and only if either v 
is stochastic and {v,p,w) G A or u G and ai{w \ xv) = p. Finally, the colouring x of 
vertices is extended to a colouring of states by setting xi^'") = x(^) for all x e V* and 
V £ V. With this definition, we could equivalcntly define the payoff of a for player i as the 
probability of the event x~^(Wini) in {G'^,vo). 

For each player i, the Markov decision process Q^-^ is defined just as , but states 
XV G V*Vi are controlled by player i (the unique player in Q'^~^), and there is a transition 
from such a state to each state of the form xvw, where w eV, with (v, X, w) G A; player z's 
objective is the same as in ^. 

2.3.3. Strategies with memory. A memory structure for a game Q with vertices in y is a 
triple 9Jt = (M, (5, mo) where M is a set of memory states, S: M x V ^ M is the update 
function, and thq G M is the initial memory. A (randomised) strategy with memory 9Jl of 
player i is a function cr: M x — > I'(F) such that a{m,v){w) > only if w e vE. The 
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strategy cr is a pure strategy with memory 9Jl if additionally the following property holds: 
for all m G M and v £ V there exists w €z V such that a{m,v){w) = 1. Hence, a pure 
strategy with memory 971 can be described by a function a: M xVi ^ V. Finally, a (pure) 
strategy profile with memory 9Jt is a tuple a = ((Ti)jgn such that each crj is a (pure) strategy 
with memory 9Jt of player i. 

A (pure) strategy a with memory dXt of player i defines a (pure) strategy of player i in the 
usual sense as follows: Let 5*{x) be the memory state after x £ V*, defined inductively by 
6*{e) = mo and 6*{xv) = 6{6*{x),v) for x e V* and v e V. If t; G Vi, then the distribution 
(successor vertex) chosen by the strategy a for the sequence xv is a{5* {x),v). Vice versa, 
every strategy (profile) of G can be viewed as a strategy (profile) with memory 9Jt := 
{V*,;e). 

A finite-state strategy (profile) is a strategy (profile) with memory 9Jt for a finite memory 
structure 9Jt. Note that a strategy profile is finite-state if and only if each of its strategies is 
finite-state. If |M| = 1, we call a strategy (profile) with memory 9Jt stationary. Moreover, 
we call a pure stationary strategy (profile) a positional strategy (profile). A stationary 
strategy of player i can be described by a function u: — >■ ViV), and a positional strategy 
even by a function a: Vi ^ V. 

If CT = (cTj)jgn is a strategy profile with memory OJt, we coarsen the Markov chain G'^ 
by taking M x as its domain. The transition relation is defined as follows: there is 
a transition from (m, v) to (n, w) with probability p > if and only if 6{m, v) = n and 
either f is a stochastic vertex of G and {v,p, w) £ A oic v £ Vi and ai{m, v){w) = p. Finally, 
a state (m, v) has the same colour as the vertex v inG- Analogously, we coarsen the Markov 
decision process G'^~'' by using M x as its domain: vertices (m, v) £ M xVi are controlled 
by player i, and there is a transition from such a vertex (m, v) to (n, w) G Af x y if and 
only if n = 5{m, v) and {v,-L,w) G A. Note that the arenas of both G'^ and G'^'^ are finite 
if the memory 9JT and the original arena of G are finite. 

2.3.4. Residual games and strategies. Given an SMG G and a sequence x £ V* (which is 
usually a history), the residual game G[x] has the same arena as G but different objectives: 
if Wiuj C is the objective of player i in G, then her objective in G[x] is x(3;)~^Wini := 
{a G C"^ : x(^) ■ S Wiuj}. In particular, if all objectives in G are prefix-independent, then 
G[x] = G. 

If player i plays according to a strategy a in G, then the corresponding strategy in G[x] is 
the residual strategy cr[x], defined by a[x]{yv) = a{xyv). If ct = (cTj)jgn is a strategy profile, 
then the residual strategy profile a[x] is just the profile of the residual strategies ai[x]. The 
following lemma, taken from [64j], shows how to compute probabilities with respect to a 
residual strategy profile. 

Lemma 2.1. Let a be a strategy profile of an SMG {G,vo), and let xv G V*V. If X C 
is a Borel set, then F^^iX n xv ■ V^) = Pr^^ixv ■ F^) • Prf'^^x-^X). 

2.4. Subarenas and end components. Algorithms for stochastic games often employ a 
divide- and- conquer approach and compute a solution for a complex game from the solution 
of several smaller games. These smaller games are usually obtained from the original game 
by restricting to a subarena. Formally, given an SMG ^, a set ?7 C y is a subarena if 
- C//0, 
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— vA n [/ 7^ for each v £ U, and 

— vA C U for each stochastic vertex v £ U. 

Clearly, if [/ is a subarena, then the restriction of G to vertices in U is again an SMG, which 
we denote hy G \ U. Formally, 

G\U:= (n, U, {Vi n C/),en, A n (f/ x ([0, 1] U {±}) X U),xu, (Win,),en), 

where xu '■ U ^ C: u>-^ x{u) is the restriction of the colouring function to U. 

Of particular interest are the strongly connected subarenas of a game because they can 
arise as the sets Inf(7r) of vertices visited infinitely often in a play; we call these sets end 
components. Formally, a set [/ C 1/ is an end component if [/ is a subarena and every 
vertex w £ U is reachable from every other vertex v £ U, i.e. there exists a sequence 
V = vi,V2, ■ ■ ■ ,Vn = w such that Uj+i £ ViA for each < i < n. An end component U 
is maximal in a set S" C y if there is no end component U' such that U ^ U' S. For 
any finite subset SCI/, the set of all end components maximal in S can be computed in 



quadratic time [24 1 



The theory of end components has been developed by de Alfaro [2J, |25|] and Courcou- 
betis and Yannakakis [2ll . 22|. The central fact about end components in finite SMGs is 
that, under any strategy profile, the set of vertices visited infinitely often is almost surely 
an end component. 

Lemma 2.2. Let ^ be a finite SMG. Then Pr^({7r £ : Inf(7r) is an end component}) = 1 
for each strategy profile a oiG and each v £V . 

Moreover, for any end component [/, we can construct a stationary strategy profile, or 
alternatively a pure finite-state strategy profile, that, when started in f/, guarantees almost 
surely to visit all and only the vertices in U infinitely often. In fact, the stationary profile 
that chooses for each vertex in f7 a successor in U uniformly at random fulfils this property. 

Lemma 2.3. Let ^ be a finite SMG and U one of its end components. There exists both a 
stationary and a pure finite-state strategy profile o such that Pr^dvr £ : Inf(a) = U}) = 
1 for every vertex v £ U. 

Given an SMG G with (objectives representable as) Muller objectives given by a fam- 
ily J-'i of accepting sets, we say that an end component U is winning for player i if x(U) £ J-'i] 
the payoff of U is the vector z £ {0, l}'^, defined by = 1 if and only if U is winning for 
player i. 



2.5. Values, determinacy and optimal strategies. Given a strategy r of player i in G 
and a vertex v £ V, the value of r from v is the number val^(f;) := inf^ Pr^~*'^(Winj), 
where a ranges over all strategy profiles of G- Moreover, the value of G for player i from v 
is the supremum of these values: valf (w) := sup,- vaV{v), where r ranges over all strategies 
of player i in G- Intuitively, valf{v) is the maximal payoff that player i can ensure when 
the game starts from v. 

Given an initial vertex vq £ V, a. strategy r of player i in G is called (almost- surely) 
winning if val'^(t;o) = 1- More generally, r is called optimal if vaF(?;o) = valf (?;o). For e > 0, 
it is called e-optimal if vaF(uo) > valf (fo) — e. A globally (e-)optimal strategy is a strategy 
that is (e-)optimal for every possible initial vertex vq £ V. Note that optimal strategies 
need not exist since the supremum in the definition of valf is not necessarily attained; 
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in this case, only e-optimal strategies do exist. Also note that there exists a globally (e-) 
optimal strategy whenever there exists an (e-)optimal strategy for every possible initial 
vertex. Finally, we say that a strategy r of player i in {Q,vo) is strongly optimal if the 
residual strategy t[x] is optimal in the residual game {G[x],v) for every history xv of {Q,vo) 
that is compatible with r. Intuitively, a strategy is strongly optimal if it is also optimal 
when the other players do not play optimally. Note that, for games with prefix-independent 
objectives, any globally optimal positional strategy profile is also strongly optimal. 

Determining values and finding optimal strategies in SMGs actually reduces to per- 
forming the same tasks in S2Gs. Formally, given an SMG G, define for each player i the 
coalition game Qi to be the same game as Q but with only two players: player i acting as 
player and the coalition 11 \ {z} acting as player 1. The coalition controls all vertices that 
in Q are controlled by some player j ^ i, and its objective is the complement of player i's 
objective in Q. Clearly, Qi is an S2G, and val^»(?j) = valf (v) for every vertex v. Moreover, 
any (strongly, e-) optimal strategy for player i in (^,Vo) is (strongly, e-) optimal in {Qi^vq), 
and vice versa. Hence, when we study values and optimal strategies, we can restrict to S2Gs. 

A celebrated theorem due to Martin [il] and Maitra and Sudderth [i^l states that S2Gs 
with Borel objectives are determined: valg = 1 — val]'. The number val^(t;) := valg (v) is 
consequently called the value of Q from v. In fact, an inspection of the proof shows that 
for turn-based games both players not only have randomised £-optimal strategies but pure 
e-optimal strategies. 



Theorem 2.4 ([48|, |47|]). Every S2G with Borel objectives is determined; for all e > 0, both 



players have e-optimal pure strategies. 

For finite S2Gs with prefix-independent objectives, we can show a stronger result than 



Theorem |2.4| : in these games, both players not only have e-optimal pure strategies but 
optimal ones js^. In fact, the proof reveals the existence of strongly optimal strategies (see 
also [5£ 



Theorem 2.5 ([36|). In any finite S2G with prefix-independent objectives, both players 
have strongly optimal pure strategies. 

For finite S2Gs with a;-regular objectives, more attractive strategies than arbitrary pure 
strategies suffice for optimality. In particular, in any finite Rabin-Streett S2G there exists 
a globally optimal positional strategy for player ia, 17]. 



Theorem 2.6 (|46l. Il7l|). In any finite Rabin-Streett S2G, player has a globally optimal 
positional strategy. 

A consequence of Theorem |2.6| is that the values of a finite Rabin-Streett S2G are 
rational of polynomial bit complexity in the size of the arena: Given a positional strategy 
profile a of Q, the finite MDP Q^-^ is not larger than the game Q. Moreover, if ctq is globally 
optimal, then for every vertex v the value of Q from v and the value of ^""^^ from v sum up 
to 1. But the values of any Streett MDP form the optimal solution of a linear programme 
of polynomial size (see [2J] ) and are therefore rati onal of small bit complexity. 

Of course, it also follows from Theorem l2.6l that finite parity S2Gs are positionally 
determined: both players have globally optimal positional strategies. This result was first 
proven for deterministic games (even over infinite arenas), independently by Emerson and 
Jutla [1^ and Mostowski [5l|. For SS2Gs, the existence of optimal positional strategies 



follows from a result of Bewley and Kohlberg [3]. Independently, Mclver and Morgan [4£ 



Chatterjee et al. jl5l| and Zielonka 6j] extended these results to parity S2Gs. 
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Corollary 2.7. In any finite parity S2G, both players have globally optimal positional 
strategies. 

Since every finite S2G with a;-regular obje ctives can be reduced to one with parity 
objectives, we can conclude from Corollary 12.71 that both players have residually optimal 
pure finite-state strategies in finite S2Gs with arbitrary w-regular objectives. 

Corollary 2.8. In any finite S2G with ik;-regular objectives, both players have strongly 
optimal pure finite-state strategies. 

2.6. Algorithmic problems. For the rest of this section, we only consider finite two- 
player zero-sum games. The main computational problems for these games are computing 
the value and optimal strategies for one or both players. Instead of computing the value 
exactly, we can ask whether the value is greater than some given rational probability 
a problem which we call the quantitative decision problem: 

Given an S2G Q, a vertex v and a rational number p £ [0, 1], decide whether 
va l^{v) > p. 

In many cases, it suffices to know whether the value is 1, i.e. whether player has a 
strategy to win the game almost surely (asymptotically, at least). We call the resulting 
decision problem the qualitative decision problem. 

Clearly, if we can solve the quantitative decision problem, we can approximate the val- 
ues val^(v) up to any desired precision by using binary search. In fact, for parity S2Gs it is 
well-known that it suffices to solve the decision problems, since the other problems (com- 
puting the values and optimal strategies) are polynomial-time equivalent to the quantitative 
decision problem. 

For a Markov decision process whose objective can be represented as a Muller objective, 
we can compute the values by an analysis of its end components: For a given initial vertex v, 
the value of the MDP from v equals the maximal probability of reaching a winning end 
component from v; this probability can be computed using linear programming. 

Even though, the number of end components can be exponential, it is easy to see that 
the union of all winning end components can be computed in polynomial time for MDPs 
with Rabin or Muller objectives (given by a family of accepting sets). For MDPs with 
Streett objectives, Chatterjee et al. gave a polynomial-time algorithm for computing 
this set. Hence, for MDPs with any of these objectives, the quantitative decision problem 
is solvable in polynomial time. 

Theorem 2.9 ((23. fl7|). The quantitative decision problem is in P for Streett, Rabin or 
Muller MDPs. 

It follows from Theorems |2.6| and |2.9| that the quantitative decision problem for Rabin- 
Streett S2Gs is in NP: to decide whether val^(t;) > p, it suffices to guess a positional 
strategy for player and to check whether in the resulting Streett MDP the value from v 
is not smaller than p. By determinacy, this result implies that the quantitative decision 
problem is in coNP for Streett-Rabin S2Gs and in NP n coNP for parity S2Gs. 

Corollary 2.10. The quantitative decision problem is 

- in NP for Rabin-Streett S2Gs, 

— in coNP for Streett-Rabin S2Gs, and 
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Qualitative 



Quantitative 



SS2Gs 
Parity [d] 
Parity 

Rabin-Streett 
Streett-Rabin 
Muller 



P-complete 
P-complete 
UP n coUP 
NP-complete 
coNP-complete 



NP n coNP 
NP n coNP 
NP n coNP 
NP-complete 
coNP-complete 



PSPACE-complete PsPACE-complete 



- in NP n coNP for parity S2Gs. 

A corresponding NP-hardness result for deterministic Rabin-Streett S2Gs was estab- 
lished by Emerson and Jutla [1^. In particular, this hardness result also holds for the 
qualitative decision problem. Moreover, by determinacy, this result can be turned into a 
coNP-hardness result for (deterministic) Streett-Rabin S2Gs. 

For S2Gs with Muller objectives, Chatterjee [13] showed that the quantitative decision 
problem falls into Pspace; for deterministic games, a polynomial-space algorithm had been 
given earlier by McNaughton [HO]- A matching lower bound for deterministic games with 
Muller objectives was provided by Hunter and Dawar [42]. 

Theorem 2.11 ([13, 113] )• The quantitative and the qualitative decision problem are 
PSPACE-complete for Muller S2Gs. 



Theorem 1 2. Ill does not hold if the Muller objective is given by a family of subsets of 
vertices: Horn ^ 13] showed that the qualitative decision problem for explicit Muller S2Gs 



is in P, and that the quantitative problem is in NP n coNP. 

Another class of S2Gs for which the qualitative decision problem is in P is, for each 
d £ N, the class Parity [d] of all parity S2Gs whose priority function refers to at most 
d priorities [2^ ]. In particular, the qualitative decision problem for SS2Gs as well as (co-) 
Biichi S2Gs is in P. For general parity S 2 Gs, however, the qualitative decision problem is 
only known to lie in UP Pi coUP [H, [l3] . 



Theorem 2.12 ([44l. Il4l. 1261]). The qualitative decision problem is in UP D coUP for parity 
S2Gs. For each d G N, the qualitative decision problem is in P for parity S2Gs with at most 
d priorities. 

Table Q summarises the results about the complexity of the quantitative and the qual- 
itative decision problem for S2Gs. P-hardness (via LoGSPACE-reductions) for all these 



problems follows from the fact that and-or graph reachability is P-complete 431 ]. 

The results summarised in Table [l| leave open the possibility that at least one of the 
following problems is decidable in polynomial time: 

(1) the qualitative decision problem for parity S2Gs, 

(2) the quantitative decision problem for SS2Gs, 

(3) the quantitative decision problem for parity S2Gs. 

Note that, given that all of them are contained in both NP and coNP, it is unlikely that 
one of them is NP-hard or coNP-hard; such a result would imply that NP = coNP, and the 
polynomial hierarchy would collapse. 
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For the first problem, Chatterjee et al. iJ] gave a polynomial-time reduction to the 
qualitative decision problem for deterministic two-player zero-sum parity games. Hence, 
solving the qualitative decision problem for parity S2Gs is not harder than deciding which 
of the two players has a winning strategy in a deterministic two-player zero-sum parity 
game. Whether the latter problem is decidable in polynomial time is a long-standing open 
problem. Several years after Emerson and Jutla [23] put the problem into NP n coNP, 
Jurdziiiski 4j] improved this bound slightly to UP R coUP. Together with Paterson and 
Zwick [45I ], he also gave an algorithm that decides the winner in subexponential time; 
a randomised subexponential algorithm had been given earlier by Bjorklund et al. 0]. On 
the other hand, Friedmann [s^ recently showed that the most promising candidate for a 
polynomial-time algorithm for the general case so far, the discrete strategy improvement 



algorithm due to Voge and Jurdziiiski j6l|], requires exponential time in the worst case. 



Regarding the second problem, only some progress towards a polynomial-time algorithm 
has been made since Condon [l9|] proved membership in NP n coNP; for instance, Bjork- 
lund and Vorobyov 18J gave a randomised subexponential algorithm for solving SS2Gs, and 
Gimbert and Horn |35l ] showed that the quantitative decision problem for SS2Gs is fixed- 
parameter tractable with respect to the number of stochastic vertices as the parameter. 

For the third problem, Andersson and Miltersen 0] recently established a polynomial- 
time Turing reduction to the second. Hence, there exists a polynomial-time algorithm 
for (2) if and only if there exists one for (3). In particular, a polynomial-time algorithm 
for (2) would also give a polynomial-time algorithm for (1). However, to the best of our 
knowledge, it is plausible that the qualitative decision problem for parity S2Gs is in P while 
the quantitative decision problem for SS2Gs is not. 



3. Existence of Nash equilibria 

To capture rational behaviour of selfish players, Nash [s^] introduced the notion of — what 
is now called — a Nash equilibrium. Formally, given a strategy profile a of a game {Q,vq), 
we call a strategy r of player i inQ a best response to a if r maximises the expected payoff of 
player i: Pr^p""^ (Wiuj) < Pr^o""^(Winj) for all strategies r' of player i. A strategy profile 
a = (o"j)ign is a Nash equilibrium if each ai is a best response to a. 

In a Nash equilibrium, no player can improve her payoff by unilaterally switching to a 
different strategy. In fact, to have a Nash equilibrium, it suffices that no player can gain 
from switching to a pure strategy. 

Proposition 3.1. A strategy profile of a game {G, vq) is a Nash equilibrium if and only if, 
for each player i and for each pure strategy r of player i in Q, Pr^(7"^(Winj) < Pr^jj(Winj). 

Proof. Clearly, if a is a Nash equilibrium, then Pr^o~"^(Winj) < Pr^jj(Winj) for each pure 
strategy r of player i in Q. Now, assume that a is not a Nash equilibrium. Hence, p := 
sup^ Pr^o""^(Winj) = Pr^y(Winj)-|-e for some player i and some e > 0. Consider t he M arkov 



decision process Clearly, the value of G'^~'' from vq equals p. By Theorem l2.4l . there 

exists an e/2-optimal pure strategy r in {Q'^~^,vo)- Since the arena of G'^'^ is a forest, we 
can assume that r is a positional strategy, which can be viewed as a pure strategy in Q. 
We have Pr^o-"^(Wini) >p-e/2>p-e = Pr^ (Wiuj). □ 
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For two-player zero-sum games, a Nash equilibrium is just a pair of optimal strategies. 

Proposition 3.2. Let {Q,vq) be an S2G. A strategy profile (cr, r) of {G,vo) is a Nash 
equilibrium if and only if both a and r are optimal. In particular, every Nash equilibrium 

of {G,vo) has payoff (val^(uo), 1 — val^(uo)). 

Proof. (^) Assume that both a and r are optimal, but that (cr, r) is not a Nash equilibrium. 
Hence, one of the players, say player 1, can improve her payoff by playing some strategy r'. 
Hence, val^(t;o) = Pr^^^(Wino) > Pr^;^'(Wino). However, since a is optimal, val^(uo) ^ 
Pr^^^ (Wino), a contradiction. The reasoning in the case that player can improve is 
analogous. 

(<^) Let (fT, r) be a Nash equilibrium of {G,vo), and let us first assume that a is not 
optimal, i.e. vaF(uo) < val^(7;o). By the definition of val^, there exists another strategy a' 
of player such that val'^(t'o) < val'^ (vq) < val^('t;o). We have 

Pr5,^(Wino) < var(t;o) < var>o) = inf,, Pr^;^'(Wino) < Pr^^'^CWino), 

where the first inequality follows from the fact that (a, r) is a Nash equilibrium. Thus, 
player can improve her payoff by playing a' instead of a, a contradiction to (cr, r) being 
a Nash equilibrium. The argumentation in the case that r is not optimal is analogous. □ 

In general, a Nash equilibrium can give a player a higher payoff than her value. However, 
the payoff a player receives in a Nash equilibrium can never be lower than her value, and 
this is true for every history that is consistent with the equilibrium. Formally, we say that 
a strategy profile of a game {G,vo) is favourable if Pr^jj(Winj | xv • V^) > val^^^\v) for 
each player i and every history xv that is consistent with a. 

Lemma 3.3. Let {G,vo) be an SMG. Every Nash equilibrium of (^,^0) is favourable. 

Proof. Assume there exists a history xv of (^,^0) that is consistent with a, but p := 
Pr^jj(Winj I XV • V^) < val^^^\v). By the definition of valf^^^, there exists a strategy r of 
player i in G[x] such that vaF(f) > p. We define a new strategy a' for player z in ^ as 
follows: cr' is defined as ai for histories that do not begin with xv. For histories of the 
form xvy, however, we set a'{xvy) = T{vy). Clearly, Fi'^^^''^ {xv ■ V^) = Fi'^^{xv ■ V^). 

Moreover, it is easy to see that Ftvq"'^ {X \ xv ■ V^) = Fv'^^{X \ xv ■ V^) for every Borel 
set X C V^. Using Lemma I2.ll . we can conclude that 

Pr^--'^'(Wini) 

= Pr^--'"'(Wini \ XV ■ V^) + Pr^--'"' (Win^ n xv ■ V^) 

= Pr^^(Wini \ XV ■ V^) + Pr^l^l-^'^'t^^ (x-^Win^) • Pr^--'^'(xi; • V^) 

= Pr^g(Wini \ XV • V^) + Pr^l^l— ^(x-^Wiui) • Fv^^{xv • V^) 

> Pr^^(Win, \xvV'^)+ YaV{v) ■ Ft^^{xv ■ V^) 

> Pr^^(Wini \xvV'^)+p- Fi^^ixv ■ V^) 

= Pr^p(Wini \ XV • V^) + Pr^^(Wini | xv • V^) • Frl^{xv ■ V^) 
= Pr^^(Win, \xvV'^)+ Pr^^(Wini n xv V^) 
= Pr^rWin,). 
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Hence, player i can improve her payoff by switching to cr', a contradiction to a being a Nash 
equihbrium. □ 



It fohows from Theorem |2.5| and Proposition |3.2| that every finite two-player zero- 
sum stochastic game with prefix-independent objectives has a Nash equilibrium in pure 
strategies. Is t his s till true if the two-player zero-sum assumption is relaxed? 



By Lemma l3.3l . a pure strategy profile can only be a Nash equilibrium if it is favourable. 
The next lemma shows that, conversely, we can turn every favourable pure strategy profile 
into a Nash equilibrium. The proof uses so-called threat strategies (or trigger strategies), 
which are added on top of the given strategy profile: each player threatens to change her 
behaviour when one of the other players deviates from the prescribed strategy profile. Before 
being applied to stochastic games, this concept proved fruitful in the related area of repeated 
games (see [s^, Chapter 8] and [s!]). 

Lemma 3.4. Let {Q,vq) be a finite SMG with prefix-independent objectives. If a is a 
favourable pure strategy profile of (Q, vo), then (Q, vq) has a pure Nash equilibrium a* with 
Pr^ = Pr^* . 

Proof. By Theorem l2.5l . for each player i we can fix a globally optimal pure strategy Tj 
of the coalition 11 \ {i} in the coalition game Qi] denote by Tj^i the corresponding pure 
strategy of player j ^ i in Q. To simplify notation, we also define Ti^i to be an arbitrary 
pure strategy of player i in Q. Player z's equilibrium strategy a* is defined as follows: 
For histories xv that are compatible with a, we set a*{xv) = ai{xv). If xv is not compatible 
with a, then decompose x into x = xi ■ X2, where xi is the longest prefix of x that is 
compatible with a, and let j be the player who has deviated, i.e. xi ends in Vj] we set 
a*{xv) = Tij{x2v). Intuitively, a* behaves like cTj as long as no other player j deviates from 
playing aj, in which case a* starts to behave like Tij. 

Note that Pr^^^ = Pr^^^. We claim that a* is additionally a Nash equi libri um of {Q,vo). 



Let z G n, and let p be a pure strategy of player i in Q; by Proposition l3.ll . it suffices to 

show that PrV"''(Wini) < Pr^*(Wini). 

Let us call a history xv £ V*Vi a deviation history if xv is compatible with both a and 
{a^i,p), but ai{xv) / p{xv); we denote the set of all deviation histories consistent with a 

by D. Clearly, F^o^xv ■ V^) = Pr^J(xu • F^) = Frvl^'^xv ■ V^) for all xv e D. 

Claim. Pr^o-"''(X \D-V'^) = P^^iX \ D ■ V^) for every Borel set X C V^. 
Proof. This claim can be proved by an induction over the structure of Borel set. 

Claim. Prt,(7''''(Winj | xv ■ V^) < valf{v) for every xv G D. 

Proof. By the definition of the strategies Tj^i, we have that Pi^^'^^^^"^ (Wini) < valf{v) for 
every vertex v € V and every strategy p of player i. Moreover, if xv is a deviation history, 
then for each player j i the residual strategy (T*[x?;] is equal to Tj^i on histories that start 
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in w := p{xv). Hence, by Lemma l2.ll and since Win^ is prefix- independent, 

PrJ"^(Wini I XV ■ F") 



Pr^o-"^(Wini I xvw ■ 

^'Lilxv],plxv] 



< valf(u;) 

< valf(^;). 

Using the previous two claims, we prove that Pr^-^"^(Wini) < Pr^^^ (Win^) as follows: 
Prf^'''(Wini) 

= FiTr'^i^m \D-V^) + Y, PrTr'\Wm n xv ■ V") 

xveD 

= Pr^^(Wini \D-V'^) + J2 Pr",r'''(Wini n xv ■ V^) 

xvdD 

= Pr:„(Win, \ D . y-) + ^ Pr^o^^'^lWini | xv ■ ■ K^'^i^v ■ 

xv£D 

= Pr^g(Wini \D-V^) + J2 P^v^'^i^m \ xv ■ V^) ■ Fr^^ixv ■ V^) 

xvdD 

< Pr^^(Wini \D-V'') + Y^ valf (i;) • FtI^{xv ■ F") 

xvSiD 

< Pr^^(Wini \D-V'^) + J2 P4,(Wini {xv-V^)- Pr^xv ■ V^) 

xvdD 

= Pr^„ (Wini \ D • y^) + ^ Pr^^ (Win^ r^xv■V'') 

= Pr^„(Win,) 
= Pr^;(Win,), 

where the second inequality follows from the assumption that a is favourable. □ 



A variant of Lemma l3.4l handles games with prefix-independent cj-regular objectives 
and finite-state strategies. 

Lemma 3.5. Let {Q,vq) be a finite SMG with prefix-independent w-regular objectives. If 
is a favourable pure finite-state strategy profile of {Q^vq), then (^,wo) has a pure finite- 
state Nash equilibrium a* with Pr^^^ = Pr^^^ . 



Proof. The proof is analogous to the proof of Lemma |3.4| . Since, by Corollary |2.8| . there 
exist optimal pure finite-state strategies in every finite SMG with luj-regular objectives, the 
strategies Tj^i defined there can be assumed to be pure finite-state strategies. Consequently, 
the equilibrium profile a* can be implemented using finite-state strategies as well. □ 
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1 
* (1, 1) 

(0,0) (1,0) 

Figure 2. A two-player game with a pair of optimal strategies that cannot be extended 
to a Nash equilibrium. 



Using Lemma |3.4| and Theorem |2.5| . we can easily prove the existence of pure Nash 



equilibria in finite SMGs with prefix-independent objectives. 

Theorem 3.6. There exists a pure Nash equilibrium in any finite SMG with prefix- 
independent objectives. 

Proof. L et G be a finite SMG with prefix-independent objectives and initial vertex vq. By 
Theorem I2.5I and the correspondence between Q and the coalition game Qi, each player i has 
a strongly optimal strategy ctj inQ. Let a = ((Tj)jgn- For every history xv that is consistent 
with a and each player i, w e ha ve Pv'^^{Wmi \ xv ■ V^) = Pr^t^^(Winj) > valf{v). Hence, 
a is favourable, and Lemma l3.4l implies the existence of a pure Nash equilibrium. □ 

For finite SMGs with w-regular objectives, we can even show the existence of a pure 
finite-state equilibrium. 

Theorem 3.7. There exists a pure finite-state Nash equilibrium in any finite SMG with 
cj-regular objectives. 

Proof. Since any SMG with cj-regular objectives can be reduced to one with parity objec- 
tives using finite mem ory, it suffices t o con sider parity SMGs. For these games, the claim 



follows fr om Corollary |2. 71 and Lemma l3.5l using the same argumentation as in the proof of 



Theorem l3.6l . □ 



Theorem l3.7l and a variant of Theorem l3.6l appeared originally in p^. However, their 
proof contains an inaccuracy: Essentially, they claim that any profile of optimal strategies 
can be extended to a Nash equilibrium with the same payoff (by adding threat strategies 
on top). This is, in general, not true, as the following example demonstrates. 

Example 3.8. Consider the deterministic two-player game {G,vo) depicted in Fig. [3 and 
played by players and 1 (with payoffs given in this order). Clearly, the value valQ(uo) 
for player from vq equals 1, and player O's optimal strategy a is to play from vq to vi. 
For player 1, the value from vq is 0, and both of her positional strategies are optimal. In 
particular, her strategy r of playing from vi to the terminal vertex with payoff (1,0) is 
optimal (albeit not globally optimal). The payoff of the strategy profile (cr, r) is (1,0). 
However, there is no Nash equilibrium of (^, fo) with payoff (1,0): In any Nash equilibrium 
of {Q,vo), player will move from vq to vi with probability 1. Player I's best response is 
to play from vi to the terminal vertex with payoff (1,1) with probability 1. Hence, every 
Nash equilibrium of this game has payoff (1, 1). 
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4. Complexity of Nash equilibria 
For the rest of this paper, we consider only finite SMGs. Previous research on algorithms for 



finding Nash equilibria in such games has focused on computing some Nash equilibrium 16|. 
However, a game may have several Nash equilibria with different payoffs, and one might not 
be interested in any Nash equilibrium but in one whose payoff fulfils certain requirements. 
For example, one might look for a Nash equilibrium where certain players win almost surely 
while certain others lose almost surely. This idea leads us to the following decision problem, 
which we call NE: 

Given an SMG {Q,vq) and thresholds x,y G [0, l]'^, decide whether there 
exists a Nash equilibrium of {G,vo) with payoff > x and < y. 

To obtain meaningful results, we assume that all transition probabilities in Q as well as 
the thresholds x and y are rational numbers (with numerator and denominator given in 
binary) and that all objectives are w-regular. A qualitative variant of the problem, which 
omits the thresholds, just asks about a Nash equilibrium where some distinguished player, 
say player 0, wins with probability 1: 

Given an SMG {Q,vq), decide whether there exists a Nash equilibrium of 
{Q,vo) where player wins almost surely. 

Clearly, every instance of the qualitative variant can easily be turned into an instance of NE 
(by adding the thresholds x = (1, 0, . . . , 0) and y = (1, . . . , 1)). Hence, NE is, a priori, more 
general than its qualitative variant. 

Note that we have so far not put any restriction on the type of strategies that realise 
the equilibrium. It is natural to restrict the search space to profiles of pure, finite-state, 
pure finite-state, stationary or positional strategies. We denote the corresponding decision 
problems by PureNE, FinNE, PureFinNE, StatNE and PosNE, respectively. In the rest 
of this paper, we are going to prove upper and lower bounds on the complexity of these 
problems, where all lower bounds even hold for the qualitative variants of these problems. 

Our first observation is that neither stationary nor pure strategies are sufficient to 
implement any Nash equilibrium, even in SSMGs and even if we are only interested in 
whether a player wins or loses al most surely in the equilibrium. Together with another result 
from this section (Proposition 4.131 ). this demonstrates that the problems NE, PureNE, 



FinNE, PureFinNE, StatNE and PosNE are distinct problems, which have to be analysed 
separately. This is in sharp contrast to the situation for SS2Gs where all these problems 
coincide because SS2Gs admit globally optimal positional strategies. 

Proposition 4.1. There exists an SSMG with a stationary Nash equilibrium where player 
wins almost surely, but with no pure Nash equilibrium where player wins with positive 
probability. 

Proof. Consider the SSMG depicted in Fig. 0, played by three players 0, 1 and 2 (with 
payoffs given in this order). Clearly, the stationary strategy profile where from vertex V2 
player selects both outgoing transitions with probability 1/2 each, player 1 plays from vq 
to vi and player 2 plays from vi to V2 is a Nash equilibrium where player wins almost 
surely. However, for any pure strategy profile where player wins with positive probability, 
either player 1 or player 2 receives payoff and could improve her payoff by switching her 
strategy at vq or vi, respectively. □ 
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12 




(0,i,0) (0,0, i) (1,1,0) 



Figure 3. An SSMG with no pure Nash 
equihbrium where player wins with posi- 
tive probabiUty. 




Figure 4. An SSMG with no stationary 
Nash equihbrium where player wins with 
positive probability. 



Proposition 4.2. There exists an SSMG with a pure finite-state Nash equilibrium where 
player wins almost surely, but with no stationary Nash equilibrium where player wins 
with positive probability. 

Proof. Consider the (deterministic) SSMG Q depicted in Fig. 0, also played by three players 
0, 1 and 2. Clearly, the pure finite-state strategy profile that leads to the terminal vertex 
with payoff (1,0,0) and where at V2 player plays "right" if player 1 has played to vq 
and "left" if player 2 has played to vq is a Nash equilibrium of {Q,vo)- Now consider any 
stationary equilibrium of {G,vo) where player wins with positive probability. If at V2 the 
stationary strategy of player prescribes to play "right" with positive probability, then 
player 2 can improve her payoff by playing to V2 with probability 1, and otherwise player 1 
can improve her payoff by playing to V2 with probability 1, a contradiction. □ 



4.1. Positional equilibria. In this subsection, we analyse the complexity of the (presum- 
ably) simplest of the decision problems introduced so far, namely PosNE. Not surprisingly, 
this problem is decidable; in fact, it is NP-complete for all types of objectives we consider 
in this paper. Let us start by proving membership to NP. Since terminal reachability, 
(co-)Biichi and parity objectives can easily be translated to Rabin or Streett objectives, 
it suffices to consider Streett-Rabin and Muller SMGs. 

Theorem 4.3. PosNE is in NP for Streett-Rabin SMGs and Muller SMGs. 

Proof. To decide PosNE, on input G, vo,x, y, we can guess a positional strategy profile a, i.e. 
a mapping IJien ^ ~^ ^5 then, we verify whether ct is a Nash equilibrium with the desired 
payoff. To do this, we first compute the payoff Zi of a for each player i by computing 
the probability of the event Wiuj in the (finite) Markov chain {Q'^,vo). Once each Zi is 
computed, we can easily check whether Xi < Zi < Ui. To verify that a is a Nash equilibrium, 
we additionally compute, for each player i, the value of the (finite) MDP Q°'-^ from vq. 
Clearly, a is a Nash equilibrium if and only if rj < Zi for each player i. Since we can compute 
the value of an MDP (or a M arkov chain) with a Streett, Rabin or Muller objective in 
polynomial time (Theorem l2.9l ). all these checks can be carried out in polynomial time. □ 

To establish NP-completeness, we still need to show NP-hardness. In fact, the reduction 
we are going to present does not only work for PosNE, but also for StatNE, where we allow 
arbitrary stationary equilibria. 
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Theorem 4.4. PosNE and StatNE are NP-hard, even for SSMGs with only two players 
(three players for the qualitative variants). 

Proof. The proof is by reduction from SAT. Given a Boolean formula (/? = Ci A . . . A Cm 
in conjunctive normal form over propositional variables Xi, . . . ,Xn, where without loss of 
generality m > 1 and each clause is non-empty, we show how to construct a two-player 
SSMG {G,vo) such that the following statements are equivalent: 

(1) ip is satisfiable. 

(2) {G, vo) has a positional Nash equilibrium with payoff (1, ^). 

(3) {Q,vo) has a stationary Nash equilibrium with payoff (1, ^). 

Provided that the game can be constructed in polynomial time, these equivalences establish 
both reductions. The game Q is depicted in Fig. [^. The game proceeds from the initial 
vertex vq to Xi or -iXj with probability 1/2*+-*^ each, and to vertex 93 with probability 1/2"+^; 
with the remaining probability of 1/2"+^ the game proceeds to a terminal vertex with 
payoff (1,0). From 93, the game proceeds to each vertex Cj with probability l/(m + 1); 
with the remaining probability of l/(m + 1), the game proceeds to a terminal vertex with 
payoff (1, 1). From vertex Cj (controlled by player 1), there is a transition to a literal L, i.e. 
L = Xi ox L = ^Xi, if and only if L occurs inside the clause Cj. Obviously, the game Q can 
be constructed from if in polynomial time. We conclude the proof by showing that ([I])-© 
are equivalent. 

(1 ^ 2) Assume that a : {Xi, . . . , X„} — >■ {true, false} is a satisfying assignment of (p. Con- 
sider the positional strategy profile where player moves from a literal L to the neighbouring 
T-labelled vertex if and only if L is mapped to true by a, and player 1 moves from ver- 
tex Cj to a literal L that is contained in Cj and mapped to true by a (which is possible 
since a is a satisfying assignment); at T-labelled vertices, player 1 never plays to a terminal 
vertex. Obviously, player wins almost surely in this strategy profile. In order to compute 
player I's payoff, note that for each variable X player 1 either receives payoff 1 from X and 
payoff from -iX, or she receives payoff 1 from ^X and payoff from X (because player 
plays according to a well-defined assignment). Moreover, player 1 wins almost surely from if 
since that assignment satisfies (p. Hence, player I's payoff equals 




Obviously, changing her strategy cannot give player 1 a better payoff. Therefore, we have 
identified a Nash equilibrium. 

(2 3) Trivial. 

(3^ 1) Let a = ((To,c7i) be a stationary Nash equilibrium of {G,vo) with payoff (1, ^). Our 
first aim is to show that ctq is actually a positional strategy. Consider any literal L such 
that ao{L) assigns probability g > to the neighbouring T-labelled vertex. Since player 
wins almost surely, we know that player 1 never plays to a terminal vertex with payoff 
(0, 1). Hence, the expected payoff for player 1 from L equals q. However, by playing to a 
terminal vertex with payoff (0, 1), player 1 can get payoff 2g/(l -|- q) from L. Since a is a 
Nash equilibrium, we have 2q/{l + q) < q, which implies that q = 1. 

Now we define a pseudo assignment a: {Xi, -1X1, . . . , — )• {true, false} by 
setting a{L) = true if and only if (Tq prescribes to go from vertex L to the neighbouring 
T-labelled vertex. Our next aim is to show that a is actually an assignment: a{Xi) = true 
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Figure 5. Reducing SAT to PosNE and StatNE. 

if and only if a{^Xi) = false. To see this, note that we can compute player I's expected 
payoff from vq as follows: 



P 



2 2"+i 



if a{Xi) = a(-iXj) = false, 
2 if a{Xi) = a(-iXj) = true, 



where p is the expected payoff for player 1 from vertex ip. By the construction of ^, we have 
p > 0, and the equality only holds if p = 1 and Oj = 1 for alH = 1, . . . , n, which proves that 
a is an assignment. 
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Finally, we claim that a satisfies ip. If this were not the case, there would exist a 
clause C such that player I's expected payoff from vertex C equals 0, and therefore p < 1. 
This is a contradiction to p = 1, as we have shown above. 

To show that the qualitative variants of PosNE and StatNE are also NP-hard, it suffices 
to modify the game Q as follows: First, we add one new player, player 2, who wins at 
precisely those terminal vertices where player 1 loses. Second, we add two new vertices 
vi and V2- At vi, player 1 has the choice to leave the game; if she decides to stay inside 
the game, the play proceeds to V2, where player 2 has the choice to leave the game; if she 
also decides to stay inside the game, the play proceeds to vertex vq from where the game 
continues normally; if player 1 or player 2 decides to leave the game, then each of them 
receives payoff ^, but player receives payoff 0. Let us denote the modified game by G'- 
It is straightforward to see that the following statements are equivalent: 

(1) (^Q',vi) has a stationary Nash equilibrium where player wins almost surely. 

(2) {Q,vo) has a stationary Nash equilibrium with payoff (1, ^). 

(3) If is satisfiable. 

(4) {G,vo) has a positional Nash equilibrium with payoff (1, ^). 

(5) (^',^1) has a positional Nash equilibrium where player wins almost surely. □ 

4.2. Stationary equilibria. To prove the decidability of StatNE, we appeal to results 
established for the existential theory of the reals, the set of all existential first-order sen- 
tences (over the appropriate signature) that hold in the ordered field D\ := (M, +, -,0, 1, <). 
The best known upper bound for the complexity of the associated decision problem is 
PSPACE [1^, which leads to the following theorem. 

Theorem 4.5. StatNE is in Pspace for Streett-Rabin SMGs and Muller SMGs. 

Proof. Since Pspace = NPspace, it suffices to provide a nondeterministic algorithm with 
polynomial space requirements for deciding StatNE. On input G,vo,x,y, where without 
loss of generality Q is an SMG with Muller objectives given by J^i C V{C), the algorithm 
starts by guessing the support S Q V x V of a stationary strategy profile a of G, i.e. 
S = {{v,w) € V X V : a{w \ v) > 0}. From the set S alone, by standard graph algorithms, 
one can compute for each player i the following sets in polynomial time (see [J, Chapter 10]): 

(1) the union Fi of all bottom SCCs U of the Markov chain Q'^ with x(C/) G J^i, 

(2) the set Ri of vertices v such that Pr^ (Reach (Fj)) > 0, 

(3) the union Tj of all end components U of the MDP Q'^-' with xiU) G J~i- 

After computing all these sets, the algorithm evaluates an existential first-order sen- 
tence ip, which can be computed in polynomial time from G, vq, x, y, S, (i?j)jgn, (-Pi)ien 
and (Tj)jgn; over d\ and returns the answer to this query. 

How does Ip look like? Let a = {avw)v,wev, r = {rl)i^u,v£V and z = {zl)i^u,vev be 
three sets of variables, and let K = Uign The formula 




■vw 



> a /\ a. 

{v,w)^S 




vw 



= 



1 A 
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states that the mapping a : V ^ [0, 1]^, defined by a{v){w) = a^w, constitutes a vahd sta- 
tionary strategy profile of G whose support is S. Provided that ^(a) holds in 9\, the formula 

rii{a,z):= /\ 4 = 1 A /\ = A /\ = ^ • 4 

veFi veV\R^ v£V\F^ wevA 

states that = Pr^(Winj) for each v £ V, where a is defined as above. This follows from 
a well-known result about Markov chains, namely that the vector of the aforementioned 
probabilities is the unique solution of the given system of equations (see 0, Chapter 10]). 
Finally, the formula 

t?i(a,f) := /\ 4 > A /\ 4 = 1 A /\ 4 > A /\ ^ = J] a,^ • 

v€V vSTj veVi t)GV\Vi w&vA 

w^vA 

states that r* > sup^ Pr^""''(Winj) for all v G V (see [3, Chapter 10]). 

The desired sentence ip is the existential closure of the conjunction of (p and, for each 
player i, the formulae iji and i)i combined with formulae stating that player i cannot improve 
her payoff and that the expected payoff for player i lies in between the given thresholds: 

ip := 3a3f3z ((^(a) A /\ (T/i(a, z) A i?i(a, f) A r^^ < zl^ Axi< z^^ < ?/i)) . 

Clearly, ^ holds in D\ if and only if (G, vq) has a stationary Nash equilibrium with payoff at 
least X and at most y whose support is 5. Consequently, the algorithm is correct. □ 



In Section l4.ll . we showed that StatNE is NP-hard, leaving a considerable gap to our 
upper bound of P SPACE. Towards gaining a better understanding, we relate StatNE to the 
square root sum problem (SqrtSum) of deciding, given numbers di, . . . , dn, k € N, whether 

Er=i Vdi>k. 

Recently, AUender et al. [1] showed that SqrtSum belongs to the fourth level of the 
counting hierarchy, a slight improvement over the previously known Pspace upper bound. 
However, it has been an open question since the 1970s as to whether SqrtSum falls into 
the polynomial hierarchy [3J, |30t]. We identify a polynomial-time reduction from SqrtSum 
to StatNE for SSMCsQ Hence, StatNE is at least as hard as SqrtSum, and showing that 
StatNE resides inside the polynomial hierarchy would imply a major breakthrough in un- 
derstanding the complexity of numerical computation. 

Theorem 4.6. SqrtSum is polynomial-time reducible to StatNE, even for 4-player SSMGs. 

Before we start with the proof of the theorem, let us first examine the game G (p) , where 
< p < 1, played by players 0, 1, 2 and 3 and depicted in Fig. 0. 

Lemma 4.7. The maximal payoff player 3 receives in a stationary Nash equilibrium of 
{G{p),si) where player wins almost surely equals 

Proof. In the following, assume without loss of generality that < p < 1 (otherwise the 
statement is trivial), and define q := 1—p. For any stationary strategy profile a of G{p) where 
player wins almost surely, let xi = ao{s2 \ ti) and X2 = <7o(si | ^2) be the probabilities 



"'^Some authors define SqrtSum using < instead of >. Witli tliis definition, we wouid reduce from the 
complement of SqrtSum instead. 
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(0,0,0,^) 



Qipi) 



Figure 6. Reducing SqrtSum 
to StatNE. 



(l,i,0,l) (1,1,0,0) 




(0,0, i,0) 



(1,0,1,0) (1,0, i,l) 

Figure 7. The game Q{p). 



that player "stays inside the game" at ti, respectively t2. Given xi and X2, for i = 1,2 
we can compute the payoff /i(a;i,a;2) := Pr^^(Winj) for player i from Sj by 

p/2 + q{l-Xi) 



fi{x\,X2) = 



1 



^XiX2 



To have a Nash equilibrium, it must be the case that fi{xi,X2), f2{xi,X2) > \ since other- 
wise player 1 or player 2 would prefer to leave the game at si or 52, respectively, which would 
give the respective player payoff ^ immediately. Vice versa, if /i(a;i, X2), f2{xi,X2) > \ then 
a is a Nash equilibrium with expected payoff 

p + qxip 

f{xi,X2) := 2 

1 - q^XiX2 

for player 3. Hence, to determine the maximum payoff for player 3 in a stationary Nash 
equilibrium where player wins almost surely, we have to maximise f{xi,X2) under the 
constraints fi{xi,X2), f2{xi,X2) > | and < xi,X2 < 1. We claim that the maximum is 
reached only if xi = X2- If e.g. xi > X2, then we can achieve a higher payoff for player 3 by 
setting X2 '■= xi, and the constraints are still satisfied: 

p/2 + q{l - x'2) _ p/2 + q{l - Xi] 



1 



q'^xiX2 



1 



^ p/2 + g(l-xi) ^ 1 

~ 1 — q'^xiX2 ~ 2 



Hence, it suffices to maximise f{x,x) subject to fi{x,x) > ^ and < x < 1, which is 
equivalent to maximising f{x,x) subject to (1 — p)x^ — 2a; + 1 > and < x < 1. The roots 
of the polynomial are (1 ± ■sjp)/{l —p), but (1 + y/p)/{l — > 1 for p > 0. Therefore, any 
solution X must satisfy x < xq := (1 — y^)/(l — p). Since < xq < 1 for < p < 1 and 
/(x,x) is strictly increasing on [0, 1], the optimal solution is xq, and the maximal payoff for 
player 3 in a stationary Nash equilibrium of {Q{p),si) where player wins almost surely 
equals indeed 

p + qxop P P P 



fixo,xo) 



1 



q'^xl 



1- qxQ 1 - (1 - p)xo 1 - (1 - ^) 



= Vp- 



□ 
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Proof of Theorem l^.d Given an instance (di , . . . ,dn,k) of SqrtSum, where without loss of 
generality n > 0, > for each i = 1, . . . ,n, and d := ^27=1 '^^ construct a 4-player 
SSMG {G, vq) such that {Q, vq) has a stationary Nash equilibrium where player wins almost 
surely if and only if J27=i Vdi > k. Define pi := dj/d^ for i = 1, . . . , n. For the reduction, we 
use n copies of the game ^(p), where in the ith copy we set p to pi. The complete game G is 
depicted in Fig.[^. By Lemma [4. tI . the maximal payoff player 3 receives in a stationary Nash 
equilibrium of {G{pi), si) where player wins almost surely equals = yfdijd. Hence, the 
maximal payoff player 3 receives in a stationary Nash equilibrium of where player 

wins almost surely equals 

1 A/d" 1 " 



n d dn 

If J27=i Vdi > k, then we can extend such an equilibrium to a stationary Nash equilibrium 
of {Q,vq) where player wins almost surely by letting player 3 play from vq to vi with 
probability 1. On the other hand, if X^iLi V^i < k, then in any stationary Nash equilibrium 
of {G,vq) player 3 plays to vi with probability 0, and player loses almost surely. □ 



Remark 4.8. The positive results of Sections |4.1| and l4.2l can easily be extended to equilibria 



in pure or randomised strategies with a memory of a fixed size A; S N: a nondeterministic 
algorithm can guess a memory 5[R of size k and then look for a positional, respectively 
stationary, equilibrium in the product of the original game Q with the memory dJt. Hence, 
for any fixed A; G N, we can decide in Pspace (NP) the existence of a randomised (pure) 
equilibrium of size k with payoff > x and < y. 



4.3. Pure and randomised equilibria. In this section, we show that the problems NE 
and PureNE are undecidable, by exhibiting a reduction from an undecidable problem about 
two-counter machines. Our construction is inspired by a construction used by Brazdil 
et al. pi)] to prov e the undecidability of stochastic games with branching-time objectives 
(see Remark k.ld below) . 

Let r := {inc(j), dec(j), zero(j) : j = 1,2} (the set of instructions). A two-counter 
machine is of the form = {Q, qo, 6), where 

— Q is a finite set of states, 

— qo £ Q is the initial state, and 

— 6'^QxTxQ is the transition relation. 

For g e Q let 5{q) := {{j,q') £ T x Q : {q,'y,q') G 5}. We cah M deterministic if for 
each q £ Q either 6{q) = 0, or 6{q) = {(inc(j), g')} for some j £ {1,2} and q' £ Q, or 
6{q) = {(zero(j), gi), (dec(j), 92)} for some j £ {1, 2} and qi, q2 £ Q. 

A configuration of is a triple C = {q,ii,i2) G Q x N x N, where q denotes the current 
state and ij denotes the current value of counter j. A configuration C" = {q' ,i'i,i'2) is a 
successor of configuration C = {q,ii,i2), denoted by C h C , if there exists a "matching" 
transition (g, 7, g') £ 5. For example, {q,ii,i2) l~ (g', ii + 1, ^2) if and only if (q, inc(l), g') £ 6. 
The instruction zero(j) performs a zero test: (5,^1,^2) l~ iQ'ih,i2) if and only if ii =0 and 
(g, zero(l), g') £ 5, ot 12=0 and (g, zero(2), g') £ d. 

A partial computation of is a finite or infinite sequence p = p{0)p{l) ... of configura- 
tions such that p(0) h p{l) h • • • and p(0) = (go, 0,0) (the initial configuration). A partial 
computation of is a computation of M if it is infinite or ends in a configuration C for 
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which there for which there exists no successor configuration. Note that each deterministic 
two-counter machine has a unique computation. 

The halting problem is to decide, given a machine M, whether A4 has a finite com- 
putation. It is weU-known that deterministic two-counter machines are Turing powerful, 
which makes the halting problem and its dual, the non-halting problem, undecidable, even 
when restricted to deterministic two-counter machines. In fact, the non-halting problem for 
deterministic two-counter machines is not recursively enumerable. 

Theorem 4.9. NE and PureNE are not recursively enumerable, even for 10-player SSMGs. 

To prove Theorem [4^91, we give a reduction from the non-halting problem for determinis- 
tic two-counter machines. Our aim is thus to compute from a machine M a 10-player SSMG 
{G,vq) such that the computation of M is infinite if and only if {G,vo) has a (pure) Nash 
equilibrium in which player wins almost surely. Without loss of generality, we assume 
that in M. there is no zero test that is followed by another zero test: if (zero(j),g') G ^(q), 
then \5{q')\ < 1. 

The game G is played by players 0, 1 and eight other players A*- and Bj, indexed by 
j £ {1, 2} and t £ {0, 1}. Intuitively, player and player 1 build up the computation of A4: 
player updates the counters, and player 1 chooses transitions. The other players make 
sure that player updates the counters correctly: players and Aj ensure that, in each 
step, the value of counter j is not too high, and players Bj and Bj ensure that, in each 
step, the value of counter j is not too low. More precisely, A^ and Bj monitor the odd steps 
of the computation, while Aj and Bj monitor the even steps. 

Let r' := r U {init}. For each q £ Q, each 7 G V, each j G {1, 2} and each t £ {0, 1}, 
the game Q contains the gadgets 5* g, /* and C* .^^ which are depicted in Fig. d. For better 
readability, terminal vertices are depicted as squares; the label indicates which players win. 
The initial vertex of G is vq := ffnit.go- Note that in the gadget S^ g, each of the players A^j 
and Bj may quit the game, which gives her a payoff of | or |, respectively, but payoff to 
players and 1. 

It will turn out that player 1 will play a pure strategy in any Nash equilibrium of {G,vq) 
where player wins almost surely, except possibly for histories that are not consistent with 
the equilibrium. Formally, we say that a strategy profile a of (Q,vo) is safe if for all 
histories xv consistent with a and ending in a vertex f G I* there exists w £ V with 
ai{w I xv) = 1. 

For any safe strategy profile a of G where player wins almost surely, let xqVq -< 
xivi -< X2V2 ^ • • • (where Xi £ V* , Vi £ V and xq = e) be the unique sequence containing 
all histories xv of {G,vo) that are consistent with a and end in a vertex v of the form t; = f * g. 
This sequence is infinite because player wins almost surely. Additionally, let qoiQi,--- 
and 7o,7i, • • • be the corresponding sequences of states and instructions, respectively, i.e. 
Vn = „ or Vn = vi for all n G N. For each 1 G |1, 2| and n G N, we set: 

a] := Pr^„ (player A] -°<i ^ ^-^^ j ^^^^ . yu.^ . 

b] := Pr^^ (player BJ ^ ^-^^ | ^^^^ . yu.^ _ 

Note that at every terminal vertex of the counter gadgets C* ^ and C}^~/ either player Aj 
or player Bj wins. For each j, the conditional probability that, given the history XnVn, we 
reach such a vertex is J2keN ^/'^^ ' \ — \- Hence, = \ — b" for all n G N. We say that 
a is stable if a" = ^ or, equivalently, = g for each j £ {1, 2} and for all n G N. 
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/* for 6{q) = {(zero(i),9i),(dec(j),g2)}: 



zero (3), 91 



dec0),q2 




/* for 5{q) = 




0,B*,B]-* 



dyj for 7 G {init, zero(j)}: 
0,1, A', A]-' 0,1, A], B'-' 




0,l,Bj,Bj-* 



Figure 8. Simulating a two-counter machine. 
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Finally, for each j G {1;2} and n G N, we define a number c" G [0,1] as follows: 
After the history x„t;„, with probability | the play enters the counter gadget CJ^Jj°'^ ^. The 
number c" is defined as the probability of subsequently reaching a grey-coloured vertex. 
Note that, by the construction of ^, it holds that c" = 1 if 7^ = zero(j) or 7„ = init; 
in particular, = C2 = 1. 

Lemma 4.10. Let a be a safe strategy profile of {G,vo) in which player wins almost 
surely. Then a is stable if and only if 

i • if 7n+i = inc(j), 
if 7„+i = dec(j), 
c] = 1 if 7„+i = zero(i), 
c" otherwise, 



for each j G {1, 2} and n G N. 

To prove the lemma, consider a safe strategy profile a of (^, ^o) in which player wins 
almost surely. For each j G {1,2} and n G N, set 

p," := Pr^^ (player A] ^ ^-^^ j ^^^^ . \ Xn+2Vn+2 ■ ■ 
The following claim relates the numbers a" and p". 

Ciaim. Let j G {1, 2}. Then a" = 5 for all n G N if and only if = i for all n G N. 
Proof. (^) Assume that a" = i for all n G N. We have a? = + 1 • a"'"'"^ and therefore 



g = + ]^ for all n G N. Hence, p] = j for all n G N. 

(<^=) Assume that p" = j for ah n G N. Since a" = + ^ • a"+^ for ah n G N, the 
numbers o" must satisfy the following recurrence: o"^'^ = 4a" — 1. Since all the numbers a" 
are probabilities, < a" < 1 for all n G N. It is easy to see that the only values for 
and a] such that < < 1 for all n G N are a° = aj = ^. But this implies that a" = | 
for ah n G N. □ 

P roof o f Lemma 4-l(^ - By the previous claim, we only need to show that = ;j if and only 
if (4.1)1 holds. Let j G {1, 2}, n G N and t = n mod 2. The probability can be expressed 
as the sum of the probability that the play reaches a terminal vertex that is winning for 
player inside C'*^ j and the probability that the play reaches such a vertex inside C'^~^^ j- 
The first probability does not depend on 7„, but the second depends on 7n+i- Let us 
consider the case that 7n+i = inc(j). In this case, 

„n _ 1 . /i _ 1 . „n\ . 1 _ n+l _ 1 L . ^'^ _|_ i . 

Pj — 4 V-^ 4 b'/ 8 ~ 4 16 S 8 ''j 

Obviously, this sum is equal to j if and only if c^~^^ = ^ • c". For any other value of 7n+i, 
the argumentation is similar. □ 

To establish the reduction, we need to show that the following statements are equivalent: 

(1) the computation of M. is infinite; 

(2) {Q,vq) has a pure Nash equilibrium in which player wins almost surely; 

(3) {Q,vo) has a Nash equilibrium in which player wins almost surely. 
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(1=^2) Assume that the computation p = p{Q)p{l) ... of is infinite. We define a 
pure strategy profile a as follows: (1) For a history that ends at the unique vertex t; S C* 
controlled by player after visiting a vertex of the form u*, ^ or f;^r* exactly n > times 
and V exactly /c > times, player plays to the grey-coloured successor vertex if k is greater 
than or equal to the value of counter j in configuration p{n — 1); otherwise, player plays to 
the other successor vertex. (2) For a history that ends in one of the instruction gadgets /* 
for 6{q) = {(zero(j), gi), (dec(j),g2)} after visiting a vertex of the form g, or exactly 
n > times, player 1 plays to 'S'^"*^^) if the value of counter j in configuration p{n — 1) is 
zero and to 5*^"*^.^ if the value of counter j in configuration p{n — 1) is not zero. (3) Any 
other player's pure strategy is defined as follows: after a history ending in 5** the strategy 
prescribes to quit the game if and only if the history is not compatible with p (i.e. if the 
corresponding sequence of instructions does not match p). 

Note that the resulting strategy profile a is safe. Moreover, since the players follows 
the computation of , a terminal vertex inside one of the counter gadgets C* is reached 
with proba bility 1 in a. Hence, player wins almos t sur ely. Moreover, by the definition 
of CT, (4.1)1 holds, and we can conclude from Lemma 4.101 that a is stable. We claim that 
a is, in fact, a Nash equilibrium of {Q^vq): It is obvious that player cannot improve her 
payoff. If player 1 deviates, we reach a history that is not compatible with p. Hence, player 

or ^2 will l^if game, which ensures that player 1 will not receive a higher payoff. 
Finally, since a is stable, none of the players or i?j can improve her payoff. 

(2 =^ 3) Trivial. 

(3=^ 1) Assume that a is a Na sh eq uilibrium of {Q^vq) in which player wins almost 
surely. In order to apply Lemma 4. id . we first prove that a is safe. By contradiction, 
assume that there exists a history xv consistent with a and ending in a vertex v £ 
such that ai{xv) assigns positive probability to two distinct successor vertices. Hence, 
S{q) = {(zero(j), gi), (dec(j), ^2)} for some j G {1,2} and qi,q2 G Q- By our assumption 
that there are no consecutive zero tests and since player wins almost surely, 

Pr^„ (player 1 wins | xv ■ vl~l^^^^^^ ■ > \ , 

but 

Pr; (player 1 wins | xv • vl;^\^^^^^ ■ V^) < h ■ 

^zcro{j),gi 



Hence, player 1 could improve her payoff by playing to vl K-s with probability 1, a con- 



tradiction to a being a Na sh equilibr ium. 

To apply Lemma 4.101 and obtain (4.1)1 . it remains to be shown that a is stable. In order 
to derive a contradiction, assume that there exists j £ {1,2} and n G N such that either 
a" < I or a" > I (i.e. < g). In the former case, player 2 gQ^^^i improve her 

payoff by quitting the game after history x^Vn, while in the latter case, player i5j'™°d2 
could improve her payoff by quitting the game, again a contradiction to a being a Nash 
equilibrium. 

From = 1 and (4.1)1 . it follows that each c" is of the form = 1/2* where i E N. We 
denote by the unique number i such that Cj = 1/2* and set p{n) = (gn,^i,^2) each 
n G N. We claim that p := p{0)p{l) ... is in fact the computation of Ai. In particular, this 
computation is infinite. It suffices to verify the following two properties: 

- piO) = ((70,0,0); 
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— p{n) h p{n + 1) for all n G N. 

The first property is immediate. To prove the second property, let p{n) = (^,^1,^2) and 
p{n + 1) = {q',i[,i2). Hence, v„ lies inside g, and Vn+i lies inside 5'y~*/ for suitable 7,7' 
and t = n mod 2. We only prove the claim for 6{q) = {(zero(l), gi), (dec(l), ^2)}; the other 
cases are similar. Note that, by the construction of the gadget it m ust be the case that 
either q' = qi and 7' = zero(l), or q' = q2 and 7' = dec(l). By 1(4.1)1 . if 7' = zero(l), then 
i'l = ii = and i'2 = 12, and if 7' = dec(l), then i'l = ii — 1 and i'2 = 12- This implies 
p{n) h p{n + 1): on the one hand, if ii = 0, then i[ ^ ii — 1, which implies 7' 7^ dec(l) and 
thus 7' = zero(l), q' = qi and i'l = ii = 0; on the other hand, if ii > 0, then 7' / zero(l) 
and thus 7' = dec(l), q' = 52 and i'l = ii — 1. □ 

Remark 4.11. For the problem PureNE, we can strengthen Theorem k.ol slightly by showing 
undecidability already for 9-player SSMGs. This can b e ach ieved by merging player and 
player 1 in the game described in the proof of Theorem |4.9| . 

Remark 4.12. The proof of Theorem l4.9l can also be viewed as a proof for the undecidability 
of a problem about the logic PCTL (probabilistic computation tree logic), introduced by 
Hansson and Jonsson [38] . PCTL is evaluated over labelled Markov chains and replaces the 
universal and existential path quantifiers of CTL by a family of probabilistic quantifiers P~^, 
where ~ is a comparison operator and x G [0, 1] is a rational probability. For example, the 
formula P^^/^F Q holds in state v if (and only if) the probability of reaching a state labelled 
with Q from v equals ^. 

Brazdil et al. [13] proved the undecidability of the following problem: given a labelled 
Markov decision process {Q, vq) and a PCTL formula ip, decide whether the controller has 
a strategy a such that the Markov chain (^^^,^0) is a model of ip. We can prove a stronger 
result, namely that there exists a fixed PCTL formula ip, which only contains the quantifiers 
P^^F and P^^G, for which the problem is undecidable. It suffices to add propositions A^, 
Al, A2, A2, Q, Qi, Q2, T, Zq and Zi according to the following rules: 

(1) if u is a terminal vertex that is winning for player A G {A^, A}, A2}, then label v 
with A; 

(2) If V is controlled by player and \vA\ = 2, then label v with Q and label one of its 
successors with Qi and the other with Q2- 

(3) if f is a terminal vertex that is winning for player 0, then label v with T; 

(4) if V = ^, then label v with Zq; ii v = v}^ then label v with Zi. 

To obtain an MDP, we make all non-stochastic vertices controlled by player 0. Finally, the 
PCTL formula for which we prove undecidability is 

P=^F T A /\ P=^G (Zt ^ /\ P=V3f a^'^ a P=^G (Q ^ V P"^^ Qi) ■ 

t=0,l j=l,2 i=0,l 

The first part of the formula states that player wins almost surely, the second part requires 
the strategy to be stable, and the last part of the formula requires the strategy to be safe. 



4.4. Finite-state equilibria. We can use the construction in the proof of Theorem |4.9| to 
show that Nash equilibria may require infinite memory, even if we are only interested in 
whether a player wins with probability or 1. 
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Proposition 4.13. There exists a finite SSMG that has a pure Nash equihbrium where 
player wins almost surely, but that has no finite-state Nash equilibrium where player 
wins with positive probability. 



Proof. Consider the game {G,vo) constructed in the proof of Theorem l4.9l for the machine Ai 
with the single transition (go, iiic(l), go)- We modify this game by adding a new initial 
vertex vi which is controlled by a new player, player 2, and from where she can either move 
to vq or to a new terminal vertex where she receives payoff 1 and every other player receives 
payoff 0. Additionally, player 2 wins at every terminal vertex of the game Q that is winning 
for player 0. Let us denote the modified game by Q'. 

Since the computation of Ai is infinite, the game {G,vq) has a pure Nash equilib- 
rium where player wins almost surely. This equilibrium induces a pure Nash equilibrium 
of {G',vi) where both player and player 2 win almost surely. 

Now assume that there exists a finite-state Nash equilibrium of {G',vi) where player 
wins with positive probability. Such an equilibrium induces a finite-state Nash equilibrium a 
of {G,vo) where player 2, and thus also player 0, wins almost surely; otherwise, player 2 
would prefer to pl ay f rom vi to the new termi nal v ertex. Using the same notation as in the 
proof of Theorem |4.9| , it follows from Lemma 4. id that c" = 1/2" for each n G N. But this 



is impossible if cr is a finite-state strategy profile. □ 
Propositions 4T and 4.131 imply that the decision problems NE, FinNE, PureNE and 



PureFinNE are pairwise distinct. Another way to see that PureNE and PureFinNE are dis- 
tinct is to observe that PureFinNE is recursively enumerable: to decide whether an SMG 
{G,vo) has a pure finite-state Nash equilibrium with payoff > x and < y, one can just 
enumerate all possible pure finite-state profiles a and check for each of them whether it 
constitutes a Nash equilibrium with the desired properties by analysing the finite Markov 
chain Q'^ and the finite MDPs G"^"'. Hence, to prove that PureFinNE is undecidable, we 
cannot reduce from the non-halting problem. Instead, we reduce from the halting prob- 
lem (which is recursively enumerable itself). The same reduction proves that FinNE is 
undecidable. 

Theorem 4.14. FinNE and PureFinNE are undecidable, even for 14-player SSMGs. 

Proof sketch. The construction is similar to the one for proving the undecidability of NE. 
Give n a t wo-counter machine Ai, we modify the SSMG G constructed in the proof of The- 
orem k.gl by adding another counter (together with four more players for checking whether 
the counter is updated correctly) that has to be incremented in each step. Moreover, the 
gadget I^^q for 6{q) = is replaced by the gadget shown in Fig. 0, and a new instruction halt 
is added, together with a suitable gadget C^ait j' ^^^^ depicted in Fig. 0. Let us denote the 
new game by G'- If A4 does not halt, any Nash equilibrium of {G',vo) where player wins 
with probability 1 needs infinite memory: to win almost surely, player must follow the 
computation of M and increment the new counter at each step, which requires infinite 
memory. On the other hand, if Ai halts, there exists a pure finite-state Nash equilibrium 
of {G',vq) in which player wins almost surely. (The argu ments for the existence of such 
an equilibrium are the same as in the proof of Theorem l4.9l : since Ai halts, the equilibrium 
can be implemented with finite memory). □ 

Remark 4.15. With the same reasoning as for PureNE, we can eliminate one player in the 
reduction for PureFinNE. Hence, this problem is already undecidable for SSMGs with 
13 players. 
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7* for 5{q) = 0: C',j for 7 = halt: 




Figure 9. Reducing from the halting problem. 



5. The strictly qualitative fragment 

In this section, we prove that the fragment of NE that arises from restricting the thresholds 
to be the same binary payoff (i.e. each entry is either or 1) is decidable for games with 
w-regular objectives; we denote this problem by StrQualNE. Formally, StrQualNE is defined 
as follows: 

Given an SMG {G,vo) and x G {0,1}'^, decide whether {Q,vo) has a Nash 
equilibrium with payoff x. 

To prove decidability, we first characterise the existence of a Nash equilibrium with a binary 
payoff in games with prefix-independent objectives. 

5.1. Characterisation of existence. Given an SMG Q and a player i, we denote by Wi 
the set of all vertices v eV such that valf{v) > 0. 

Proposition 5.1. Let {G,vo) be any SMG with prefix-independent objectives, and let 
X G {0, l}'^. Then the following statements are equivalent: 

(1) {G,vq) has a Nash equilibrium with payoff x; 

(2) there exists a strategy profile a of {Q,vo) with payoff x such that Pr^^^ (Reach = 
for each player i with Xj = 0; 

(3) there exists a pure strategy profile a of {G, vq) with payoff x such that Pr^^^ (Reach(Wj)) = 
for each player i with Xi = 0; 

(4) {G,vo) has a pure Nash equilibrium with payoff x. 

If additionally all objectives are w-regular, then each of the above statements is equivalent 
to each of the following statements: 

(5) There exists a pure finite-state strategy profile a with payoff x such that 
Pr^^(Reach(VFj)) = for each player i with Xi = 0. 

(6) {G,vo) has a pure finite-state Nash equilibrium with payoff x. 

Proof. (1=^>2) Let a be a Nash equilibrium of (G,vo) with payoff x. We claim that a is 
already the strategy profile we are looking for: Pr^|j(Reach(Wi)) = for each player i with 
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Xi = 0. Let i G n be a player with Xi = 0. By Lemma l3.3l and since Win^ is prefix- 
independent, we have = Pr^|^(Winj | xv ■ V^) > valf{v) for all histories xv that are 
consistent with a. Hence, v £ V \ Wi for all such histories xv, and Pr^^ (Reach (VFj)) = 0. 

(2 =^ 3) Let be a strategy profile of ^o) with payoff x such that Pr^^j (Reach (Wj)) = 
for each player i with Xi = 0. Consider the MDP that is obtained from Q by removing 
all vertices v € V such that v £ Wi for some player i with Xi = 0, merging all players into 
one, and imposing the objective 

Win = Pi Wiui n f] Wiui . 

The MDP M is well-defined since its domain is a subarena of Q. Moreover, the value 
val''^(fo) of Ai from vq equals 1 because the strategy profile a induces a strategy cr in 
satisfying Pr^jj(Win) = 1. Since eac h of the objectives Wiuj is prefix- independent, so is the 
objective Win. Hence, by Theorem l2.5l . {A4,vq) admits an optimal pure strategy r. Since 
val-'^(t;o) = 1, we have PrJ^y(Win) = 1, and r induces a pure strategy profile of {G,vo) with 
the desired properties. 

(3^4) Consider any pure strategy profile a of {G,vo) with payoff x such that 
Pr^^ (Reach(Wi)) = for each player i with Xi = 0. We show that a is favourable: 
Pr^^ (Wiuj I XV ■ V^) > valf{v) for each player i and each history xv of {Q,vo) that is consis- 
tent with a. There are two cases: If Xi = 1, then Pr"^{Wmi \ xv-V^) = 1 for all histories xv 
consistent with a, and the inequality holds. Otherwise, Xi = and Pr^^^ (Reach (Wj)) = 0. 
Hence, valf{v) = f or all histories xv consistent with a, and the inequality holds as well. 
Now, by Lemma [s^ . we can extend a to a pure Nash equilibrium with payoff x. 

(4^1) Trivial. 

Under the additional assumption that all objectives are w-regular, the im plica tions 
(2 ^ 5) and (5 ^ 6) are proven analogously (using Lemma El instead of Lemma ES); the 
implication (6 ^ 1) is trivial. □ 

As an immediate consequence of Proposition |5jJ, we can conclude that pure finite-state 
strategies are as powerful as arbitrary randomised strategies as far as the existence of Nash 
equilibria with binary payoffs in finite SMGs with w-regular objectives is concerned. 

Corollary 5.2. Let {Q,vo) be a finite SMC with w-regular objectives, and let x G {0, 1}^. 
There exists a Nash equilibrium of (Q,vq) with payoff x if and only if there exists a pure 
finite-state Nash equilibrium of {G,vq) with payoff x. 



Proof. The claim follows from Proposition l5.ll and the fact that every SMG with w-regular 
objectives can be reduced to one with parity objectives (using finite memory). □ 



5.2. Computational Complexity. We can now give an algorithm tha t de cides StrQualNE 
for SMGs with Muller objectives. The algorithm relies on Proposition l5.ll . which allows us 
to reduce StrQualNE to an MDP problem. 

Formally, given a Muller SMG Q = {Il,V,{Vi)i,zYi,A,x,{J^i)i<^n) and a binary payoff 
X = {xi)i^]i, we define the Markov decision process G{x) as follows: Let Z CV he the set of 
all vertices v such that va\f{v) = for each player i with Xi = 0; the set of vertices of G(x) is 
precisely the set Z, with the set of vertices controlled by player being Zq := IJjgn(VinZ); if 
Z = 0, we define G(x) to be a trivial MDP with the empty set as its objective. The transition 
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relation of G{x) is the restriction of A to transitions between Z-states. Note that the 
transition relation of G{x) is well-defined since Z is a subarena of Q. Finally, the single 
objective in G(x) is Reach(T) where T C Z is the union of all end components U Z with 
payoff X. 

Lemma 5.3. Let {G,vo) be a finite Muller SMG, and let x G {0, 1}^. Then {Q,vo) has a 
Nash equilibrium with payoff x if and only if val^*-^^(uo) = 1- 



Proof. (=^) Assume that (^,^0) has a Nash equilibrium with payoff x. By Proposition l5.ll . 
there exists a strategy profile a of {G,vo) with payoff x such that Pr^jj(Reach(y \ Z)) = 0. 
We claim that Pr^^ (Reach (T)) = 1. Otherwise, by Lemma |2.2| . there would exist an end 
component U C. Z such that PrJ^^^dTr G : Inf(7r) = U}) > 0, and U is either not winning 
for some player i with = 1 or it is winning for some player i with Xi = 0. But then 
a cannot have payoff x, a contradiction. Now, since Pr^jj(Reach(y \ Z)) = 0, the strategy 
profile a induces a strategy a in G{x) such that PrJ^y(X) = Pv'^^^{X) for every Borel set 
X C Z'^. In particular, Pr^^ (Reach (T)) = 1 and hence val^(^)(wo) = 1. 

(<^) Assume that val^^^''(uo) = 1 (in par ticu lar, vq £ Z), and let a be an optimal 



strategy in (G{x),vo). Prom a, using Lemma l2.3l . we can devise a strategy a' such that 
Pr^^dvr G : Inf(7r) has payoff x}) = 1. Finally, a' can be extended to a strategy 



profile a of (^,^0) with payoff x such that Pr^ (Reach (F \ Z)) = 0. By Proposition |5.1 



this implies that {Q,vo) has a Nash equilibrium with payoff x. □ 

Since the values of an MDP with a reachability objective can be computed in polynomial 
time, the difficult part lies in computing the MDP G{x) from Q and x (i.e. its domain Z 
and the target set T). For Muller SMGs, polynomial space suffices to achieve this. In fact, 
StrQualNE is PsPACE-complete for these games. 

Theorem 5.4. StrQualNE is PsPACE-complete for Muller SMGs. 



Proof. Hardness follows from Theorem |2. 111 . To prove membership in Pspace, we describe 
a polynomial-space algorithm for deciding StrQualNE on Muller SMGs: On input G,vo,x, 
the algorithm starts by computing for each player i with Xi = the set of vertices v such 
that valf (u) = 0, which can be done in polynomial space by Theorem I2. 111 . The intersection 
of these sets is the domain Z of the Markov decision process Q{x). If vq is not contained in 
this intersection, the algorithm immediately rejects. Otherwise, the algorithm determines 
the union T of all end components with payoff x contained in Z by enumerating all subsets 
of Z, one at a time, and checking which ones are end components with payoff x. Finally, 
the algorithm computes (in polynomial time) the value val^'^^^(fo) of the MDP G{x) from vq 
and accepts if this value is 1. In all other cases , th e algorithm rejects. The correctness of 
the algorithm follows immediately from Lemma 15.31 . □ 

For games with Streett objectives, StrQualNE becomes NP-complete; we start by prov- 
ing the upper bound. 

Theorem 5.5. StrQualNE is in NP for Streett SMGs. 

Proof. We describe a nondeterministic polynomial-time algorithm for solving StrQualNE: 
On input G, vo,x, the algorithm starts by guessing a subarena Z' QV and for each player i 
with Xj = a positional strategy Tj of the coalition 11 \ {i} in the coalition game Qi. 
In the next step, the algorithm checks (in polynomial time) whether vaF*(w) = 1 for each 
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Figure 10. Reducing SAT to StrQualNE for games with Streett objectives. 

vertex v ^ Z' and each player i with Xi = 0. If not, the algorithm rejects immediately. 
Otherwise, the algorithm proceeds by guessing (at most) n := \V\ subsets C/i, . . . , C/„ ^ 
and checks whether they are end components with payoff x (which can be done in polynomial 
time). If yes, the algorithm sets T' := Uj=i computes (in polynomial time) the 

value val^^^^(fo) of the MDP Q{x) from vq with Z' substituted for Z and T' substituted 
for T. If this value equals 1, the algorithm accepts; otherwise, it rejects. 

It remains to be shown that the algorithm is correct: On the one hand, if {G,vo) 
has a Nash equilibrium with payoff x, then the run of the algorithm wh ere it guesses 
Z' = Z, globally optimal positional strategies Tj (which exist by Theorem |2.6| ) and end 



components Ui such that T' = T will be accepting since then, by Lemma 5^3, val^*-^^ (f o) = 1. 
On the other hand, in any accepting run of the algorithm we have Z' C Z and T' C T, 
and the computed value cannot be higher than val^^^^(fo); hence, Yal^^^\vo) = 1, and 



Lemma l5.3l guarantees the existence of a Nash equilibrium with payoff x. □ 

The matching lower bound does even hold for deterministic two-player Streett games 
and was established in \5t 



Theorem 5.6. StrQualNE is NP-hard for deterministic two-player Streett games. 

Proof. The proof is accomplished by a variant of the proof for NP-hardness of the qualitative 
decision problem for deterministic two-player zero-sum Rabin-Streett games (29| and by a 
reduction from SAT. Given a Boolean formula = Ci A • • • A Cm in conjunctive normal 
form, where without loss of generality m > 1 and each clause is nonempty, we construct a 
deterministic two-player Streett game Q as follows: For each clause C, the game Q has a 
vertex C, which is controlled by player 0, and for each literal L occurring in (p, there is a 
vertex L, which is controlled by player 1. There are edges from a clause to each literal that 
occurs in this clause, and from a literal to each clause occurring in if. The structure of the 
game is depicted in Fig. [l^. Player O's objective is given by the empty Streett objective, 
i.e. she wins every play of the game, whereas player I's objective consists of all Streett 
pairs of the form ({Ar},{-iX}) or {{^X},{X}), i.e. she wins if, for each variable X, either 
X and -iX are both visited infinitely often or neither of them is. 

Clearly, Q can be constructed from 93 in polynomial time. We claim that 99 is satisfiable 
if and only if (Q, Ci) has a Nash equilibrium with payoff (1, 0). 

(=^) Assume that ip is satisfiable, and consider the following positional strategy ctq of 
player 0: whenever the play reaches a clause, then do plays to a literal that is mapped to 
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true by the satisfying assignment. This strategy ensures that for each variable X at most 
one of the Hterals X or -iX is visited infinitely often. Hence, (ao, cJi) is a Nash equilibrium 
of {Q,Ci) with payoff (1,0) for every strategy ai of player 1. 

(<^) Let ((To, cJi) be a Nash equilibrium of {Q, Ci) with payoff (1, 0), and assume that ip is 
not satisfiable. Consider the two-player zero-sum Rabin-Streett game Q, which is derived 
from Q by setting player O's objective to the complement of player I's objective. We claim 
that player 1 has a winning strategy in {Q,Ci), which she could use to improve her payoff 
in (^,Ci), a contradiction to (o"o,c7i) being a Nash equilibrium. By determinacy, we only 
need to show that player does not have a winning strategy. Le t r be an optimal positional 
strategy of player in {Q,Ci) (which exists by Theorem |2.6| ). Since (p is unsatisfiable, 
there must exist a variable X and clauses C and C such that t(C) = X and t{C') = -^X. 
But player 1 can counter this strategy by playing from X to C and from any other literal 
to C. Hence, r is not winning. □ 

For games with Rabin objectives, the situation is more delicate. One might think 
that, because of the duality of Rabin and Streett objectives, StrQualNE is in coNP for 
SMGs with Rabin objectivesH However, as we will see later, this is rather unlikely, and 
we can only show that the problem lies in the class p'^^[^°s] gf problems solvable by a 
deterministic polynomial-time algorithm that may perform a logarithmic number of queries 
to an NP oracle. In fact, the same upper bound holds for games with a Streett or a Rabin 
objective for each player. 

Theorem 5.7. StrQualNE is in pNP[iog] for Streett-Rabin SMGs. 

Proof. Let us describe a polynomial-time algorithm performing a logarithmic number of 
queries to an NP oracle for the problem. On input Q^vq^x, the algorithm starts by de- 
termining for each vertex v and each Rabin player i with = whether valf (?;) = 0. 
Naively implemented, this requires a super-logarithmic number of queries to the oracle. 
To reduce the number of queries, we use a neat trick, due to Hemachandra js^. Let us 
denote by R and S the set of players i G H with Xj = who have a Rabin, respectively 
a Streett objective. Instead of looping through all pairs of a vertex and a player, we start 
by determining the number r of all pairs {v,i) such that i € R and valf (f) = 0. It is not 
difficult to see that this number can be computed using binary search by performing only 
a logarithmic number of qu eries to an NP oracle, which we can use for deciding whether 
va lf{v) > (Corollary [mO). 

Then we perform one more query; we ask whether for each 
player i G RU S there exists a set Zi C V as well as sets C/i, . . . , C/|y| C V and positional 
strategies (crj)jg^ and (rj)ig5, where o"j is a strategy of player i and Tj is a strategy of the 
coalition H \ {i} in the coalition game Gi, with the following properties: 

(1) Z := ClieRus ^ subarena of G with vq S Z, and J2i£R \ 

(2) val'^'('u) > for each player i G R and each v £ V \ Zi] 

(3) val'^*(f) = 1 for each player i £ S and each v £ Zi, 

(4) each Uj is an end component oi Q \ Z with payoff x; 

(5) the value from vq of the MDP that is obtained from Q by restricting to vertices inside Z 
and imposing the objective Reach(lJ{C/i, . . . , C/|vi}) equals 1. 

This query can be decided by an NP oracle by guessing suitable sets and strategies and 
verifying (l)-(5) in polynomial time. If the answer to the query is yes, the algorithm accepts; 
otherwise it rejects. 



'in fact, Ummels and Wojtczak ^B^] claimed that the problem is in coNP. 
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Obviously, the algorithm runs in polynomial time. To see that the algorithm is correct, 
first note that for each player i £ R the set Zi does not only include all v £ V such that 
valf (f) = 0, but also excludes all other vertices. Otherwise, there would exist a vertex 
V £ Zi with va\f{v) > 0. But then the number of pairs {v,i) with i G R and valf{v) = 
would be strictly less than r, a contradiction. Now, t he c orrectness of the algorithm follows 
with the same reasoning as in the proof of Theorem Is.sl . □ 

Remark 5.8. For a bounded number of players, StrQualNE is in coNP for SMGs with Rabin 
objectives. 

Regarding lower bounds for StrQualNE in SMGs with Rabin objectives, we start by 
proving that the problem is coNP-hard, even for deterministic two-player games. 

Theorem 5.9. StrQualNE is coNP-hard for deterministic two-player Rabin games. 



Proof. The proof is similar to the proof of Theorem l5.6l and is accomplished by a reduction 
from the unsatisfiability problem for Boolean formulae in conjunctive normal form. Given 
a Boolean formula (p = Ci A • • • A Cm in conjunctive normal form, where without loss of 
generality m > 1 and each clause is nonempty, we construct a deterministic two-pl ayer 
Rabin game Q as follows. The arena of Q is the same as in the proof of Theorem Is.d . 
depicted in Fig. [l^. However, this time player 1 wins every play of the game (her objective 
consists of the single Rabin pair {V, 0)), and player O's objective consists of all Rabin pairs 
of the form {{X}, {^X}) or {{^X}, {X}). 

Clearly, Q can be constructed from ip in polynomial time. We claim that the (p is 
unsatisfiable if and only if (Q, Ci) has a Nash equilibrium with payoff (0, 1). 

(=^) Assume that ip is unsatisfiable, and consider the two-player zero-sum Rabin-Streett 
game Q, which is derived from Q by setting player I's objective to the complement of 
player O's objective. Let o"i be a globally optimal strategy for player 1 in this game. We 
claim that cJi is winning in (^Oi Ci). Consequently, (o"o, o"i) is a Nash equilibrium of {Q, Ci) 
with payoff (0, 1) for every strategy ctq of player 0. Otherwise, player would have a 
positional winning strategy in {Q,Ci). But a positional strategy r of player picks for each 
clause a literal contained in this clause. Since <p is unsatisfiable, there must exist a variable X 
and clauses C and C such that t(C) = X and t{C') = -^X. Player 1 could counter this 
strategy by playing from X to C and from any other literal to C, a contradiction. 

(<^) Let ((To,o"i) be a Nash equilibrium of {Q,Ci) with payoff (0,1), and assume that 
99 is satisfiable. Consider the following positional strategy r of player 0: whenever the play 
reaches a clause, then r plays to a literal that is mapped to true by the satisfying assignment. 
This strategy ensures that for each variable X at most one of the literals X or -iX is visited 
infinitely often. Since the construction of Q ensures that, under any strategy profile, at least 
one literal is visited infinitely often, r ensures a winning play for player 0. Hence, player 
can improve her payoff by playing r instead of do, a contradiction to the fact that ((TojO'i) 
is a Nash equilibrium. □ 

The next result shows that StrQualNE is not only coNP-hard for Rabin games, but 
also NP-hard. In fact, it is even NP-hard to decide whether in a deterministic Rabin game 
there exists a play that fulfils the objective of each player. 

Proposition 5.10. The problem of deciding, given a deterministic Rabin game, whether 
there exists a play that is won by each player is NP-hard. 
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Proof. We reduce from SAT: given a Boolean formula (f = Ci A ■ ■ ■ A Cm in conjunctive 
normal form over propositional variables Xi, . . . , Xn, where without loss of generality m > 1 
and each clause is nonempty, we show how to construct in polynomial time a deterministic 
(n + l)-player Rabin game G such that if is satisfiable if and only if there exists a play of G 
that is won by each player. The game has vertices Ci, . . . , Cm and, for each clause C and 
each literal L that occurs in C, a vertex {C,L). All vertices are controlled by player 0. 
There are edges from a clause Cj to each vertex (Cj, L) such that L occurs in Cj and from 



there to C(jjnod m)+i- The arena of G is schematically depicted in Fig. [UJ. The Rabin 
objectives are defined as follows: 

— player wins every play of G] 

— player i 7^ wins if each vertex of the form (C, Xj) is visited only finitely often or each 
vertex of the form (C, ^Xi) is visited only finitely often. 

Clearly, G can be constructed from ip in polynomial time. To establish the reduction, 
we need to show that <p is satisfiable if and only if there exists a play of G that is won by 
each player. 

(=^>) Assume that a: {Xi, . . . , Xn} — >■ {true, false} is a satisfying assignment of (p. 
Clearly, the positional strategy of player where from each clause C she plays to a fixed 
vertex (C, L) such that L is mapped to true by a induces a play that is won by each player. 

(<^=) Assume that there exists a play ir of G that is won by each player. Obviously, 
it is not possible that both a vertex {C,Xi) and a vertex (C",-iXj) are visited infinitely 
often in tt since this would violate player i's objective. Consider the variable assignment 
that maps X to true if some vertex (C, X) is visited infinitely often in tt. This assignment 
satisfies the formula because, by the construction of G, for each clause C there exists a 
literal L in C such that the vertex (C, L) is visited infinitely often in tt. □ 

It follows from Theorem 5^9 and Proposition 5. id that, unless NP = coNP, StrQualNE 
is not contained in NP U coNP, even for deterministic Rabin games. With a little more 
eff^ort, one can show that StrQualNE is DP-hard for deterministic Rabin games (see (59l|). 
Finally, for stochastic Rabin games, we can show that StrQualNE is P^PP°gl -complete. 

Theorem 5.11. StrQualNE is P^P[^°gl-hard for Rabin SMGs. 
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Proof. Wagner [g^] and, independently, Buss and Hay [11.] showed that p^^[^°s] is the closure 
of NP with respect to polynomial-time Boolean formula reducibility. The canonical complete 
problem for this class is to decide, given a Boolean combination a of statements of the form 
'V is satisfiable", where ranges over all Boolean formulae, whether a evaluates to true. 
We claim that for every such statement a we can construct in polynomial time a Rabin 
SMG {G,vq) such that a evaluates to true if and only if {G,vo) has a Nash equilibrium 
with payoff (0, 1, ... , 1). The game Q is constructed by induction on the complexity of a; 
without loss of generality, we assume that negations are only applied to atoms. If a is of 
the form is satisfiable " or "(/? is not sat isfia ble", then the existence of a suitable game Q 
follows from Proposition 5. id or Theorem Is.ol . respectively. 



Now, let a = ai A 02, and assume that we already have constructed suitable games 
{Gi,vi) and (02,^2), played by the same players 0,1,..., n. The game Q is the disjoint 
union of Qi and G2 combined with one new stochastic vertex vq. From vq, the game moves 
with probability ^ each to vi or V2- Obviously, {G,vo) has a Nash equilibrium with payoff 
(0, 1, . . . , 1) if and only if both {Qi,vi) and {G2, V2) have such an equilibrium. 

Finally, let a = aiM a2, and assume that we already have constructed suitable games 
(^1, f 1) and {G21V2), again played by the same players 0, 1, . . . , n. As in the previous case, 
the game Q is the disjoint union of Qi and Q2 combined with one new vertex uq, which has 
transitions to both vi and V2- However, this time vq is controlled by player 1. Obviously, 
{Q^vq) has a Nash equilibrium with payoff (0, 1, . . . , 1) if and only if at least one of the 
games and {Q2^V2) has such an equilibrium. □ 

Our next aim is to pro ve that StrQualNE is in UP n coUP for parity SMGs. We will 



make use of Algorithm l5.ll . which computes for a game Q with priority functions (Oj)ien 
and X G {0, 1}^ the union of all end components with payoff x. The algorithm is a straight- 
forward adaptation of the algorithm for computing the union of all winning end components 
in a Streett MDP [l3] ■ At the heart of the algorithm lies the procedure FindEC that re- 
turns on input X C.V the union of all end components with payoff x that are contained 
in X. The procedure starts by computing all end components maximal in X. If such an 
end component U has payoff x, all vertices in U can be added to the result of the procedure. 
Otherwise, there exists a player i such that either Xj = and the least priority for player i 
in U is odd or Xj = 1 and the least priority for player i in [/ is even. Each end component 
with payoff x inside U must exclude all vertices with this least priority. Hence, we call the 
procedure recursively on the subset of U that results from removing these vertices. 

Note that on input X, the total number of recursive calls to the procedure FindEC is 
bounded by \X\. Since, additionally, the set of all end compo nen ts maximal in a set X can 



be computed in polynomial time, this proves that Algorithm l5.ll runs in polynomial time 



Theorem 5.12. StrQualNE is in UP n coUP for parity SMGs. 

Proof. A UP algorithm that decides StrQualNE for parity SMGs works as follows: On input 
Q, Vq, X, the algorithm starts by guessing, for each player i with Xi = 0, the set Zi of vertices v 
with valf (f) = 0. Then, for each v £ V, the guess whether v G Zi oi v ^ Zi is verified 
by running the UP algorithm for the respective problem. If some guess was not correct, 
the algorithm reject s ini mediately. Otherwise, it constructs the subarena Z := r\i£U:x,=o 



and uses Algorithm l5.ll to determine the union T of all end components with payoff x. If 
Vq ^ Z, the algorithm rejects immediately. Otherwise, it computes in polynomial time the 
value val^^^''(fo) of the MDP G{x) from vq. If this value equals 1, the algorithm accepts; 
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Algorithm 5.1. Finding end components in parity SMGs. 

Input: parity SMG Q = (11, V, {Vi)i^n, A, x, {^i)i<^n), x = (xi)ign G {0, 1} 
Output: [J{U C y : {/ is an end component of Q with payoff 

output FindEC(F) 

procedure FindEC(X) 
Z :=0 

compute all end components of Q maximal in X 
for each such end component U do 

P := {i G n : um).VLi{x{U)) = Xi mod 2} 
if P = then 

(* U is an end component with payoff x *) 
Z := ZlJU 
else 

(* U has the wrong payoff *) 
Y := CliMv G U : Qiixiv)) > mmQiixiU))} 
Z := ZU FindEC(y) 
end if 
end for 
return Z 
end procedure 



otherwise, it rejects. Analogously, an algorithm for the complement of StrQualNE accepts 
if and only if vq ^ Z or val^'-'^^(fo) < 1. 

Obviously, both algorithms run in polynomial time. Moreover, on each input there 
exists at most one accepting run because the algorithms only accept if ea ch o f the sets Zi 



has been guessed correctly. Finally, their correctness follows from Lemma 1 5. 3l . □ 



Recall from Section l2.d that it is an open question whether the qualitative decision 
problem for parity S2Gs admits a polynomial-time algorithm. Such an algorithm would al- 
low us compute the domain of the MDP G{x) efficiently, which would imply that StrQualNE 
is in P for parity SMGs. In fact, given a class C of parity S2Gs for which the qualitative 
decision problem is in P, we can easily derive a class of parity SMGs for which StrQualNE 
is in P, namely the class C* of all parity SMGs such that for each player i the coalition 
game Gi is in C. 

Theorem 5.13. Let C be a class of finite parity S2Gs such that the qualitative decision 
problem is decidable in P for games in C. Then StrQualNE is in P for games in C*. 

Proof. Consider the algorithm given in the proof of Theorem 5.121 . For each player i, the 



set Zi can be computed in polynomial time if Qi £ C, and there is no need to guess this set. 
The resulting deterministic algorithm still runs in polynomial time. □ 



By Theorem l2.12l . for each d E N, the qualitative decision proble m for parity S2Gs with 



at most d priorities belongs to P. Hence, it follows from Theorem 5.131 that StrQualNE 
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is decidable in polynomial time for parity SMGs with at most d priorities. In particular, 
StrQualNE is in P for (co-)Biichi SMGs. 

Corollary 5.14. For each d G N, StrQualNE is decidable in polynomial time for parity 
SMGs with at most d priorities. 



6. Conclusion 

We have analysed the complexity of deciding whether a stochastic multiplayer game with 
cj-regular objectives has a Nash equilibrium whose payoff falls into a certain interval. Our 
results demonstrate that this problem is more complicated for multiplayer games than 
for two-player zero-sum games. In particular, the problem of deciding the existence of 
a Nash equilibrium where player wins almost surely is undecidable for simple stochastic 
multiplayer games, whereas the same problem is decidable in polynomial time for two-player 
zero-sum simple stochastic games. On the positive side, we have shown that the strictly 
qualitative fragment of NE has a complexity that is comparable to the complexity of the 
qualitative decision problem for two-player zero-sum games. 

Several directions for future research come to mind: First, one can study other restric- 
tions of NE that might be decidable. For example, it is plausible that the restriction of 
NE to games with two players is decidable. Second, it would be interesting to extend our 
results to other game models such as concurrent games 5^, 13] or games with quantitative 
payoff functions. 
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